Background and Question

Since 2012, the Centers for Medicare and Medicaid Services have implemented the Hospital Readmission Reduction Program (HRRP). This program tracks hospital readmission rates and incentivizes hospitals to reduce unnecessary readmissions through financial penalties. Using the 2019-2022 readmission data from the HRRP, this analysis aims to identify the preferred and non-preferred hospitals for hip and knee replacements for a health insurance company. Furthermore, it will examine the risk factors associated with higher readmission rates for these procedures.

Question:

What risk factors are associated with hospital readmission rates for hip/knee replacements?

Motivation:

Understanding these risk factors can help health insurance companies guide patients towards hospitals with better outcomes, thereby improving patient outcomes and reducing costs associated with readmissions.

Need:

The insights from this analysis can be used to improve hospital performance, enhance patient care, and reduce costs. As of 2019, the average cost of readmission after hip/knee surgery was $8,588, and avoiding that cost would be highly beneficial for health insurance companies and consumers alike (Phillips et al., 2019).

Novelty:

Previous analyses have used these same or similar datasets with Logistic Regression and Random Forest models to identify the most important risk factors as they pertain to hospital readmission rates for hip/knee replacements. We will be trying to improve on this type of analysis by improving the performance of the models using various techniques. Prior analyses have implemented Random Forest models to extract important risk factors, but no prior analyses have used Random Forest to classify hospitals as preferred or non-preferred for hip/knee replacement, based on the important risk factors.

Hypothesis:

Hospitals with better Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) scores will have lower readmission rates for hip/knee replacements because higher patient satisfaction often correlates with better overall care quality and patient outcomes, including reduced complications and better post-discharge support (Edwards et al., 2015).

Data and Analysis

We will be using the datasets from the Centers for Medicare and Medicaid Services (Centers for Medicare & Medicaid Services, 2024). Our target variable will be the readmission rate after hip/knee surgery, using data from 2019-2022. We will utilize predictors from the HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems) dataset as well as Timely and Effective Care, containing information on average wait times and vaccination compliance, Complications and Deaths, containing information about the frequency of deaths and complications for procedures, and Payment and Spending metrics, which includes the costs associated with procedures.

Analysis Plan:

  1. Data Preprocessing:
    o Handle missing values, outliers, and inconsistencies in the dataset.
    o Exploratory data analysis will be performed.
  2. Model Selection:
    o Random Forest: This model will be used to determine the most significant risk factors associated with hospital readmission, post hip/knee replacement surgery. The model will then be used in a binary classification task, which will categorize hospitals as either “preferred” or “non-preferred”, based on the national average for hospital readmission post hip/knee surgery.
    o Elastic Net: While prior analyses have used coefficients from Logistic Regression modeling to determine significant risk factors, this analysis will leverage L1 and L2 regularization via an Elastic Net model.
    o Neural Network: After using Elastic Net and Random Forest to find significant risk factors, these risk factors will be utilized in a neural network model and used to classify hospitals as either “preferred” or “non-preferred”. Classification ability of the neural network model will then be compared with the classification ability of the Random Forest model.
  3. Feature Importance Analysis:
    o Identify which predictors significantly influence hospital readmission rate, post hip/knee replacement surgery.
    o Linear regression analysis (Elastic Net) significant risk factors will be compared to tree-based analysis (Random Forest) significant risk factors for overlap.
  4. Model Validation:
    o Validate the model using cross-validation techniques (k-fold, nested) to ensure robustness and generalizability.
    o Hyperparameter tuning will be performed. Assessment of the model’s performance will be based on accuracy, AUC, and ROC metrics.
    o The 2024 data will be used as the test set for this analysis.

Assessment:

Successful Analysis:

We will consider our analysis successful if we can identify clear risk factors associated with hospital readmission rates and accurately classify hospitals as preferred or non-preferred.

Hypothesis Support:

Our hypothesis will be supported if hospitals with better HCAHPS scores demonstrate statistically significantly lower readmission rates for hip/knee replacements.

Pitfalls:

A potential pitfall of our analysis plan is data quality and completeness. The dataset does contain missing values, and it will need to be preprocessed to handle these missing values, outliers, and inconsistencies. Another potential pitfall is not having adequate computing power to implement deep learning with the size of our dataset. Lastly, a pitfall that we need to keep an eye out for is overfitting. We will know we have overfitting if the train set far outperforms the test set, in terms of model accuracy.

Exploratory Data Analysis

Data loading and preprocessing

Loading the Data (AC)

# Set the directory for the data files
filepath <- "/Users/adelinecasali/Desktop/hospitals_current_data/" 

# List the files in the directory that have "Hospital.csv"
files <- list.files(path = filepath, pattern = "Hospital.csv")

# Iterate through each file in the list
for(f in 1:length(files)) {
  
# Read the CSV, clean column names to upper camel case, and store in "dat"
    dat <- clean_names(read_csv(paste0(filepath, files[f]),
                                show_col_types = FALSE), 
                       case = "upper_camel")
    
# Remove ".Hospital.csv" part of the file names to create variable name
    filename <- gsub(".Hospital\\.csv", "", files[f])
    
# Assign data to a variable with the above created name
    assign(filename, dat)
}
# Create a df of file names without ".Hospital.csv"
files <- gsub(".Hospital\\.csv", "", files) %>% data.frame()

# Set column name of the df to "File Name"
names(files) <- "File Name"

files %>% 
  kable(
    format = "html",
    caption = "Table 1. List of hospital-level data files.") %>%
    kable_styling(bootstrap_options = c("striped", full_width = F)
  )
Table 1. List of hospital-level data files.
File Name
Complications_and_Deaths
FY_2024_HAC_Reduction_Program
FY_2024_Hospital_Readmissions_Reduction_Program
HCAHPS
Healthcare_Associated_Infections
Maternal_Health
Medicare_Hospital_Spending_Per_Patient
Outpatient_Imaging_Efficiency
Payment_and_Value_of_Care
Timely_and_Effective_Care
Unplanned_Hospital_Visits

Exploring and Preprocessing the FY_2024_Hospital_Readmissions_Reduction_Program dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of FY_2024_Hospital_Readmissions_Reduction_Program 
head(FY_2024_Hospital_Readmissions_Reduction_Program,10)
## # A tibble: 10 × 12
##    FacilityName         FacilityId State MeasureName NumberOfDischarges Footnote
##    <chr>                <chr>      <chr> <chr>       <chr>                 <dbl>
##  1 SOUTHEAST HEALTH ME… 010001     AL    READM-30-H… N/A                      NA
##  2 SOUTHEAST HEALTH ME… 010001     AL    READM-30-H… 616                      NA
##  3 SOUTHEAST HEALTH ME… 010001     AL    READM-30-A… 274                      NA
##  4 SOUTHEAST HEALTH ME… 010001     AL    READM-30-P… 404                      NA
##  5 SOUTHEAST HEALTH ME… 010001     AL    READM-30-C… 126                      NA
##  6 SOUTHEAST HEALTH ME… 010001     AL    READM-30-C… 117                      NA
##  7 MARSHALL MEDICAL CE… 010005     AL    READM-30-A… N/A                       1
##  8 MARSHALL MEDICAL CE… 010005     AL    READM-30-C… 137                      NA
##  9 MARSHALL MEDICAL CE… 010005     AL    READM-30-P… 285                      NA
## 10 MARSHALL MEDICAL CE… 010005     AL    READM-30-H… 129                      NA
## # ℹ 6 more variables: ExcessReadmissionRatio <chr>,
## #   PredictedReadmissionRate <chr>, ExpectedReadmissionRate <chr>,
## #   NumberOfReadmissions <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## Footnote 
##    12077

Replacing values with NA and “Too Few to Report” values with “5”

# Use the function "replace_with_na_all()" to replace aberrant values with NA
FY_2024_Hospital_Readmissions_Reduction_Program <- replace_with_na_all(FY_2024_Hospital_Readmissions_Reduction_Program, condition = ~ .x == "N/A")

# Replace "Too Few to Report" values with "5" in using gsub
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- gsub("Too Few to Report", "5", FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)

# Check first 10 rows to confirm that it worked
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions, 10)
##  [1] "5"   "149" "32"  "68"  "11"  "20"  NA    "14"  "40"  "24"
# NumberOfReadmissions had to be converted to numeric before applying integers
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- as.numeric(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)

# Find all values of "5" in NumberOfReadmissions
fives <- which(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions == 5)

# Replace values of "5" with random integers from 1 - 10
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions[fives] <- sample(1:10, length(fives), replace = TRUE)

# Check the first 20 rows to see if this was applied correctly
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions,20)
##  [1]   3 149  32  68  11  20  NA  14  40  24   3  NA  10  21  15  83  36  75   2
## [20]  NA

Converting columns to numeric

# Selecting the columns to convert
columns_to_convert <- c("NumberOfDischarges", "ExcessReadmissionRatio", "PredictedReadmissionRate", "ExpectedReadmissionRate", "NumberOfReadmissions")

# Use mutate_at to convert the specified columns to numeric
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  mutate_at(vars(one_of(columns_to_convert)), as.numeric)

# Print the structure of the dataframe to check the changes
str(FY_2024_Hospital_Readmissions_Reduction_Program)
## tibble [18,774 × 12] (S3: tbl_df/tbl/data.frame)
##  $ FacilityName            : chr [1:18774] "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" ...
##  $ FacilityId              : chr [1:18774] "010001" "010001" "010001" "010001" ...
##  $ State                   : chr [1:18774] "AL" "AL" "AL" "AL" ...
##  $ MeasureName             : chr [1:18774] "READM-30-HIP-KNEE-HRRP" "READM-30-HF-HRRP" "READM-30-AMI-HRRP" "READM-30-PN-HRRP" ...
##  $ NumberOfDischarges      : num [1:18774] NA 616 274 404 126 117 NA 137 285 129 ...
##  $ Footnote                : num [1:18774] NA NA NA NA NA NA 1 NA NA NA ...
##  $ ExcessReadmissionRatio  : num [1:18774] 0.892 1.1 0.933 0.987 0.952 ...
##  $ PredictedReadmissionRate: num [1:18774] 3.53 23.13 12.9 17.05 9.81 ...
##  $ ExpectedReadmissionRate : num [1:18774] 3.96 21.02 13.83 17.28 10.31 ...
##  $ NumberOfReadmissions    : num [1:18774] 3 149 32 68 11 20 NA 14 40 24 ...
##  $ StartDate               : chr [1:18774] "07/01/2019" "07/01/2019" "07/01/2019" "07/01/2019" ...
##  $ EndDate                 : chr [1:18774] "06/30/2022" "06/30/2022" "06/30/2022" "06/30/2022" ...

Removing excess text from measure names

FY_2024_Hospital_Readmissions_Reduction_Program <-  FY_2024_Hospital_Readmissions_Reduction_Program %>%
  mutate(MeasureName = gsub("READM-30-", "", MeasureName)) %>% 
  mutate(MeasureName = gsub("-HRRP", "", MeasureName)) 

Creating a dictionary for medical conditions

dict <- tribble(
  ~Acronym, ~Definition,
  "HIP-KNEE", "Total Hip/Knee Arthroplasty",
  "HF", "Heart Failure",
  "COPD", "Chronic Obstructive Pulmonary Disease",
  "AMI", "Acute Myocardial Infarction",
  "CABG", "Coronary Artery Bypass Graft",
  "PN", "Pneumonia"
)
dict %>% 
  kable(
    format = "html",
    caption = "Table 2. Acronyms of medical conditions for which hospital readmissions are tracked.") %>%
    kable_styling(bootstrap_options = c("hover", full_width = F)
  )
Table 2. Acronyms of medical conditions for which hospital readmissions are tracked.
Acronym Definition
HIP-KNEE Total Hip/Knee Arthroplasty
HF Heart Failure
COPD Chronic Obstructive Pulmonary Disease
AMI Acute Myocardial Infarction
CABG Coronary Artery Bypass Graft
PN Pneumonia

Pivoting the data wider

readmissionsClean <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  pivot_wider(
    names_from = MeasureName, 
    values_from = c(NumberOfDischarges, ExcessReadmissionRatio, PredictedReadmissionRate, ExpectedReadmissionRate, NumberOfReadmissions), 
    id_cols = c(FacilityName, FacilityId, State, StartDate, EndDate)
  )

# Check the new dataframe
dim(readmissionsClean)
## [1] 3129   35
head(readmissionsClean)
## # A tibble: 6 × 35
##   FacilityName         FacilityId State StartDate EndDate NumberOfDischarges_H…¹
##   <chr>                <chr>      <chr> <chr>     <chr>                    <dbl>
## 1 SOUTHEAST HEALTH ME… 010001     AL    07/01/20… 06/30/…                     NA
## 2 MARSHALL MEDICAL CE… 010005     AL    07/01/20… 06/30/…                     NA
## 3 NORTH ALABAMA MEDIC… 010006     AL    07/01/20… 06/30/…                     NA
## 4 MIZELL MEMORIAL HOS… 010007     AL    07/01/20… 06/30/…                     NA
## 5 CRENSHAW COMMUNITY … 010008     AL    07/01/20… 06/30/…                     NA
## 6 ST. VINCENT'S EAST   010011     AL    07/01/20… 06/30/…                     NA
## # ℹ abbreviated name: ¹​`NumberOfDischarges_HIP-KNEE`
## # ℹ 29 more variables: NumberOfDischarges_HF <dbl>,
## #   NumberOfDischarges_AMI <dbl>, NumberOfDischarges_PN <dbl>,
## #   NumberOfDischarges_CABG <dbl>, NumberOfDischarges_COPD <dbl>,
## #   `ExcessReadmissionRatio_HIP-KNEE` <dbl>, ExcessReadmissionRatio_HF <dbl>,
## #   ExcessReadmissionRatio_AMI <dbl>, ExcessReadmissionRatio_PN <dbl>,
## #   ExcessReadmissionRatio_CABG <dbl>, ExcessReadmissionRatio_COPD <dbl>, …

Filtering for only hip/knee conditions

readmissionsClean <- readmissionsClean %>%
  select(FacilityName, FacilityId, State, matches("HIP-KNEE$"))

Exploring and Preprocessing the HCAHPS dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of HCAHPS 
head(HCAHPS,10)
## # A tibble: 10 × 22
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 15 more variables: TelephoneNumber <chr>, HcahpsMeasureId <chr>,
## #   HcahpsQuestion <chr>, HcahpsAnswerDescription <chr>,
## #   PatientSurveyStarRating <chr>, PatientSurveyStarRatingFootnote <dbl>,
## #   HcahpsAnswerPercent <chr>, HcahpsAnswerPercentFootnote <chr>,
## #   HcahpsLinearMeanValue <chr>, NumberOfCompletedSurveys <chr>,
## #   NumberOfCompletedSurveysFootnote <chr>, SurveyResponseRatePercent <chr>,
## #   SurveyResponseRatePercentFootnote <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- HCAHPS %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PatientSurveyStarRatingFootnote 
##                          430641

Removing footnote columns and replacing NA values

# Removing all footnote columns
HCAHPS <- HCAHPS %>%
  select(-ends_with("footnote"))

# Replacing all "Not Applicable" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Creating a dictionary for HCAHPS questions

dictHCAHPS <- tribble(
  ~`Measure ID`, ~`Measure Name`,
  "H-CLEAN-HSP-A-P", "Patients who reported that their room and bathroom were 'Always' clean",
  "H-CLEAN-HSP-SN-P", "Patients who reported that their room and bathroom were 'Sometimes' or 'Never' clean",
  "H-CLEAN-HSP-U-P", "Patients who reported that their room and bathroom were 'Usually' clean",
  "H-CLEAN-HSP-STAR-RATING", "Cleanliness - star rating",
  "H_CLEAN_LINEAR_SCORE", "Cleanliness - linear mean score",
  "H-COMP-1-A-P", "Patients who reported that their nurses 'Always' communicated well",
  "H-COMP-1-SN-P", "Patients who reported that their nurses 'Sometimes' or 'Never' communicated well",
  "H-COMP-1-U-P", "Patients who reported that their nurses 'Usually' communicated well",
  "H-COMP-1-STAR-RATING", "Nurse communication - star rating",
  "H_COMP_1_LINEAR_SCORE", "Nurse communication - linear mean score",
  "H-COMP-2-A-P", "Patients who reported that their doctors 'Always' communicated well",
  "H-COMP-2-SN-P", "Patients who reported that their doctors 'Sometimes' or 'Never' communicated well",
  "H-COMP-2-U-P", "Patients who reported that their doctors 'Usually' communicated well",
  "H-COMP-2-STAR-RATING", "Doctor communication - star rating",
  "H_COMP_2_LINEAR_SCORE", "Doctor communication - linear mean score",
  "H-COMP-3-A-P", "Patients who reported that they 'Always' received help as soon as they wanted",
  "H-COMP-3-SN-P", "Patients who reported that they 'Sometimes' or 'Never' received help as soon as they wanted",
  "H-COMP-3-U-P", "Patients who reported that they 'Usually' received help as soon as they wanted",
  "H-COMP-3-STAR-RATING", "Staff responsiveness - star rating",
  "H_COMP_3_LINEAR_SCORE", "Staff responsiveness - linear mean score",
  "H-COMP-5-A-P", "Patients who reported that staff 'Always' explained about medicines before giving it to them",
  "H-COMP-5-SN-P", "Patients who reported that staff 'Sometimes' or 'Never' explained about medicines before giving it to them",
  "H-COMP-5-U-P", "Patients who reported that staff 'Usually' explained about medicines before giving it to them",
  "H-COMP-5-STAR-RATING", "Communication about medicine - star rating",
  "H_COMP_5_LINEAR_SCORE", "Communication about medicines - linear mean score",
  "H-COMP-6-N-P", "Patients who reported that NO, they were not given information about what to do during their recovery at home",
  "H-COMP-6-Y-P", "Patients who reported that YES, they were given information about what to do during their recovery at home",
  "H-COMP-6-STAR-RATING", "Discharge information - star rating",
  "H_COMP_6_LINEAR_SCORE", "Discharge information - linear mean score",
  "H-COMP-7-A", "Patients who 'Agree' they understood their care when they left the hospital",
  "H-COMP-7-D-SD", "Patients who 'Disagree' or 'Strongly Disagree' that they understood their care when they left the hospital",
  "H-COMP-7-SA", "Patients who 'Strongly Agree' that they understood their care when they left the hospital",
  "H-COMP-7-STAR-RATING", "Care transition - star rating",
  "H_COMP_7_LINEAR_SCORE", "Care transition - linear mean score",
  "H-HSP-RATING-0-6", "Patients who gave their hospital a rating of 6 or lower on a scale from 0 (lowest) to 10 (highest)",
  "H-HSP-RATING-7-8", "Patients who gave their hospital a rating of 7 or 8 on a scale from 0 (lowest) to 10 (highest)",
  "H-HSP-RATING-9-10", "Patients who gave their hospital a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest)",
  "H-HSP-RATING-STAR-RATING", "Overall rating of hospital - star rating",
  "H_HSP_RATING_LINEAR_SCORE", "Overall hospital rating - linear mean score",
  "H-QUIET-HSP-A-P", "Patients who reported that the area around their room was 'Always' quiet at night",
  "H-QUIET-HSP-SN-P", "Patients who reported that the area around their room was 'Sometimes' or 'Never' quiet at night",
  "H-QUIET-HSP-U-P", "Patients who reported that the area around their room was 'Usually' quiet at night",
  "H-QUIET-HSP-STAR-RATING", "Quietness - star rating",
  "H_QUIET_LINEAR_SCORE", "Quietness - linear mean score",
  "H-RECMND-DN", "Patients who reported NO, they would probably not or definitely not recommend the hospital",
  "H-RECMND-DY", "Patients who reported YES, they would definitely recommend the hospital",
  "H-RECMND-PY", "Patients who reported YES, they would probably recommend the hospital",
  "H-RECMND-STAR-RATING", "Recommend hospital - star rating",
  "H_RECMND_LINEAR_SCORE", "Recommend hospital - linear mean score",
  "H-STAR-RATING", "Summary star rating"
)

dictHCAHPS %>% 
  kable(
    format = "html",
    caption = "Table 3. Measure IDs and Measure Names from HCAHPS") %>%
    kable_styling(bootstrap_options = c("hover", "full_width" = F))
Table 3. Measure IDs and Measure Names from HCAHPS
Measure ID Measure Name
H-CLEAN-HSP-A-P Patients who reported that their room and bathroom were ‘Always’ clean
H-CLEAN-HSP-SN-P Patients who reported that their room and bathroom were ‘Sometimes’ or ‘Never’ clean
H-CLEAN-HSP-U-P Patients who reported that their room and bathroom were ‘Usually’ clean
H-CLEAN-HSP-STAR-RATING Cleanliness - star rating
H_CLEAN_LINEAR_SCORE Cleanliness - linear mean score
H-COMP-1-A-P Patients who reported that their nurses ‘Always’ communicated well
H-COMP-1-SN-P Patients who reported that their nurses ‘Sometimes’ or ‘Never’ communicated well
H-COMP-1-U-P Patients who reported that their nurses ‘Usually’ communicated well
H-COMP-1-STAR-RATING Nurse communication - star rating
H_COMP_1_LINEAR_SCORE Nurse communication - linear mean score
H-COMP-2-A-P Patients who reported that their doctors ‘Always’ communicated well
H-COMP-2-SN-P Patients who reported that their doctors ‘Sometimes’ or ‘Never’ communicated well
H-COMP-2-U-P Patients who reported that their doctors ‘Usually’ communicated well
H-COMP-2-STAR-RATING Doctor communication - star rating
H_COMP_2_LINEAR_SCORE Doctor communication - linear mean score
H-COMP-3-A-P Patients who reported that they ‘Always’ received help as soon as they wanted
H-COMP-3-SN-P Patients who reported that they ‘Sometimes’ or ‘Never’ received help as soon as they wanted
H-COMP-3-U-P Patients who reported that they ‘Usually’ received help as soon as they wanted
H-COMP-3-STAR-RATING Staff responsiveness - star rating
H_COMP_3_LINEAR_SCORE Staff responsiveness - linear mean score
H-COMP-5-A-P Patients who reported that staff ‘Always’ explained about medicines before giving it to them
H-COMP-5-SN-P Patients who reported that staff ‘Sometimes’ or ‘Never’ explained about medicines before giving it to them
H-COMP-5-U-P Patients who reported that staff ‘Usually’ explained about medicines before giving it to them
H-COMP-5-STAR-RATING Communication about medicine - star rating
H_COMP_5_LINEAR_SCORE Communication about medicines - linear mean score
H-COMP-6-N-P Patients who reported that NO, they were not given information about what to do during their recovery at home
H-COMP-6-Y-P Patients who reported that YES, they were given information about what to do during their recovery at home
H-COMP-6-STAR-RATING Discharge information - star rating
H_COMP_6_LINEAR_SCORE Discharge information - linear mean score
H-COMP-7-A Patients who ‘Agree’ they understood their care when they left the hospital
H-COMP-7-D-SD Patients who ‘Disagree’ or ‘Strongly Disagree’ that they understood their care when they left the hospital
H-COMP-7-SA Patients who ‘Strongly Agree’ that they understood their care when they left the hospital
H-COMP-7-STAR-RATING Care transition - star rating
H_COMP_7_LINEAR_SCORE Care transition - linear mean score
H-HSP-RATING-0-6 Patients who gave their hospital a rating of 6 or lower on a scale from 0 (lowest) to 10 (highest)
H-HSP-RATING-7-8 Patients who gave their hospital a rating of 7 or 8 on a scale from 0 (lowest) to 10 (highest)
H-HSP-RATING-9-10 Patients who gave their hospital a rating of 9 or 10 on a scale from 0 (lowest) to 10 (highest)
H-HSP-RATING-STAR-RATING Overall rating of hospital - star rating
H_HSP_RATING_LINEAR_SCORE Overall hospital rating - linear mean score
H-QUIET-HSP-A-P Patients who reported that the area around their room was ‘Always’ quiet at night
H-QUIET-HSP-SN-P Patients who reported that the area around their room was ‘Sometimes’ or ‘Never’ quiet at night
H-QUIET-HSP-U-P Patients who reported that the area around their room was ‘Usually’ quiet at night
H-QUIET-HSP-STAR-RATING Quietness - star rating
H_QUIET_LINEAR_SCORE Quietness - linear mean score
H-RECMND-DN Patients who reported NO, they would probably not or definitely not recommend the hospital
H-RECMND-DY Patients who reported YES, they would definitely recommend the hospital
H-RECMND-PY Patients who reported YES, they would probably recommend the hospital
H-RECMND-STAR-RATING Recommend hospital - star rating
H_RECMND_LINEAR_SCORE Recommend hospital - linear mean score
H-STAR-RATING Summary star rating

Pivoting the data wider

HCAHPSClean <- HCAHPS %>%
  pivot_wider(
    names_from = HcahpsMeasureId, 
    values_from = c(PatientSurveyStarRating, HcahpsAnswerPercent, HcahpsLinearMeanValue, SurveyResponseRatePercent), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(HCAHPSClean)
## [1] 4814  375
head(HCAHPSClean)
## # A tibble: 6 × 375
##   FacilityName    FacilityId State PatientSurveyStarRat…¹ PatientSurveyStarRat…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    <NA>                   <NA>                  
## 2 MARSHALL MEDIC… 010005     AL    <NA>                   <NA>                  
## 3 NORTH ALABAMA … 010006     AL    <NA>                   <NA>                  
## 4 MIZELL MEMORIA… 010007     AL    <NA>                   <NA>                  
## 5 CRENSHAW COMMU… 010008     AL    <NA>                   <NA>                  
## 6 ST. VINCENT'S … 010011     AL    <NA>                   <NA>                  
## # ℹ abbreviated names: ¹​PatientSurveyStarRating_H_COMP_1_A_P,
## #   ²​PatientSurveyStarRating_H_COMP_1_SN_P
## # ℹ 370 more variables: PatientSurveyStarRating_H_COMP_1_U_P <chr>,
## #   PatientSurveyStarRating_H_COMP_1_LINEAR_SCORE <chr>,
## #   PatientSurveyStarRating_H_COMP_1_STAR_RATING <chr>,
## #   PatientSurveyStarRating_H_NURSE_RESPECT_A_P <chr>,
## #   PatientSurveyStarRating_H_NURSE_RESPECT_SN_P <chr>, …

Exploring and Preprocessing the Timely_and_Effective_Care dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of Timely_and_Effective_Care
head(Timely_and_Effective_Care,10)
## # A tibble: 10 × 16
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 9 more variables: TelephoneNumber <chr>, Condition <chr>, MeasureId <chr>,
## #   MeasureName <chr>, Score <chr>, Sample <chr>, Footnote <chr>,
## #   StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Timely_and_Effective_Care %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()

Replacing NA values

# Replacing all "Not Applicable" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Creating a dictionary Timely and Effective Care measure names

dictCare <- tribble(
  ~`Measure ID`, ~`Measure Name`,
  "EDV", "Emergency department volume (alternate Measure ID: EDV-1)",
  "ED-2", "Average (median) admit decision time to time of departure from the emergency department for emergency department patients admitted to inpatient status",
  "IMM-3", "Healthcare workers given influenza vaccination",
  "HCP COVID-19", "COVID-19 Vaccination Coverage Among HCP",
  "OP-18b", "Average (median) time patients spent in the emergency department before leaving from the visit (alternate Measure ID: OP-18)",
  "OP-18c", "Average time patients spent in the emergency department before being sent home (Median Time from ED Arrival to ED Departure for Discharged ED Patients – Psychiatric/Mental Health Patients) *This measure is only found in the downloadable database, it is not displayed on Hospital Care Compare",
  "OP-22", "Percentage of patients who left the emergency department before being seen",
  "OP-23", "Percentage of patients who came to the emergency department with stroke symptoms who received brain scan results within 45 minutes of arrival",
  "OP-29", "Percentage of patients receiving appropriate recommendation for follow-up screening colonoscopy",
  "OP-31", "Percentage of patients who had cataract surgery and had improvement in visual function within 90 days following the surgery",
  "SEP-1", "Severe Sepsis and Septic Shock",
  "SEP-SH-3HR", "Septic Shock 3 Hour",
  "SEP-SH-6HR", "Septic Shock 6 Hour",
  "SEV-SEP-3HR", "Severe Sepsis 3 Hour",
  "SEV-SEP-6HR", "Severe Sepsis 6 Hour",
  "STK-02", "Percentage of ischemic stroke patients prescribed or continuing to take antithrombotic therapy at hospital discharge",
  "STK-03", "Percentage of ischemic stroke patients with atrial fibrillation/flutter who are prescribed or continuing to take anticoagulation therapy at hospital discharge",
  "STK-05", "Percentage of ischemic stroke patients administered antithrombotic therapy by the end of hospital day 2",
  "STK-06", "Percentage of ischemic stroke patients who are prescribed or continuing to take statin medication at hospital discharge",
  "VTE-1", "Percentage of patients that received VTE prophylaxis after hospital admission or surgery",
  "VTE-2", "Percentage of patients that received VTE prophylaxis after being admitted to the intensive care unit (ICU)",
  "Safe Use of Opioids", "Percentage of patients who were prescribed 2 or more opioids or an opioid and benzodiazepine concurrently at discharge"
)

dictCare %>% 
  kable(
    format = "html",
    caption = "Table 4. Measure IDs and Measure Names from Timely and Effective Care") %>%
    kable_styling(bootstrap_options = c("hover", "full_width" = F))
Table 4. Measure IDs and Measure Names from Timely and Effective Care
Measure ID Measure Name
EDV Emergency department volume (alternate Measure ID: EDV-1)
ED-2 Average (median) admit decision time to time of departure from the emergency department for emergency department patients admitted to inpatient status
IMM-3 Healthcare workers given influenza vaccination
HCP COVID-19 COVID-19 Vaccination Coverage Among HCP
OP-18b Average (median) time patients spent in the emergency department before leaving from the visit (alternate Measure ID: OP-18)
OP-18c Average time patients spent in the emergency department before being sent home (Median Time from ED Arrival to ED Departure for Discharged ED Patients – Psychiatric/Mental Health Patients) *This measure is only found in the downloadable database, it is not displayed on Hospital Care Compare
OP-22 Percentage of patients who left the emergency department before being seen
OP-23 Percentage of patients who came to the emergency department with stroke symptoms who received brain scan results within 45 minutes of arrival
OP-29 Percentage of patients receiving appropriate recommendation for follow-up screening colonoscopy
OP-31 Percentage of patients who had cataract surgery and had improvement in visual function within 90 days following the surgery
SEP-1 Severe Sepsis and Septic Shock
SEP-SH-3HR Septic Shock 3 Hour
SEP-SH-6HR Septic Shock 6 Hour
SEV-SEP-3HR Severe Sepsis 3 Hour
SEV-SEP-6HR Severe Sepsis 6 Hour
STK-02 Percentage of ischemic stroke patients prescribed or continuing to take antithrombotic therapy at hospital discharge
STK-03 Percentage of ischemic stroke patients with atrial fibrillation/flutter who are prescribed or continuing to take anticoagulation therapy at hospital discharge
STK-05 Percentage of ischemic stroke patients administered antithrombotic therapy by the end of hospital day 2
STK-06 Percentage of ischemic stroke patients who are prescribed or continuing to take statin medication at hospital discharge
VTE-1 Percentage of patients that received VTE prophylaxis after hospital admission or surgery
VTE-2 Percentage of patients that received VTE prophylaxis after being admitted to the intensive care unit (ICU)
Safe Use of Opioids Percentage of patients who were prescribed 2 or more opioids or an opioid and benzodiazepine concurrently at discharge

Pivoting the data wider

careClean <- Timely_and_Effective_Care %>%
  pivot_wider(
    names_from = MeasureId, 
    values_from = c(Score), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(careClean)
## [1] 4677   26
head(careClean)
## # A tibble: 6 × 26
##   FacilityName   FacilityId State EDV   ED_2_Strata_1 ED_2_Strata_2 HCP_COVID_19
##   <chr>          <chr>      <chr> <chr> <chr>         <chr>         <chr>       
## 1 SOUTHEAST HEA… 010001     AL    high  <NA>          <NA>          80.7        
## 2 MARSHALL MEDI… 010005     AL    high  148           105           79.8        
## 3 NORTH ALABAMA… 010006     AL    high  <NA>          <NA>          79          
## 4 MIZELL MEMORI… 010007     AL    low   <NA>          <NA>          57.9        
## 5 CRENSHAW COMM… 010008     AL    low   <NA>          <NA>          81.2        
## 6 ST. VINCENT'S… 010011     AL    high  <NA>          <NA>          88          
## # ℹ 19 more variables: IMM_3 <chr>, OP_18b <chr>, OP_18c <chr>, OP_22 <chr>,
## #   OP_23 <chr>, OP_29 <chr>, OP_31 <chr>, SAFE_USE_OF_OPIOIDS <chr>,
## #   SEP_1 <chr>, SEP_SH_3HR <chr>, SEP_SH_6HR <chr>, SEV_SEP_3HR <chr>,
## #   SEV_SEP_6HR <chr>, STK_02 <chr>, STK_03 <chr>, STK_05 <chr>, STK_06 <chr>,
## #   VTE_1 <chr>, VTE_2 <chr>

Exploring and Preprocessing the Complications_and_Deaths dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of Complications_and_Deaths
head(Complications_and_Deaths,10)
## # A tibble: 10 × 18
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 11 more variables: TelephoneNumber <chr>, MeasureId <chr>,
## #   MeasureName <chr>, ComparedToNational <chr>, Denominator <chr>,
## #   Score <chr>, LowerEstimate <chr>, HigherEstimate <chr>, Footnote <chr>,
## #   StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Complications_and_Deaths %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()

Replacing NA values

# Replacing all "Not Applicable" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Creating a dictionary for Complications and Deaths measure names

dictDeaths <- tribble(
  ~`Measure ID`, ~`Measure Name`,
  "COMP-HIP-KNEE", "Rate of complications for hip/knee replacement patients",
  "PSI 90", "Serious complications (this is a composite or summary measure; alternate Measure ID: PSI-90-SAFETY)",
  "PSI 03", "Pressure sores (alternate Measure ID: PSI_3_Ulcer)",
  "PSI 04", "Deaths among patients with serious treatable complications after surgery (alternate Measure ID: PSI-4-SURG-COMP)",
  "PSI 06", "Collapsed lung due to medical treatment (alternate Measure ID: PSI-6-IAT-PTX)",
  "PSI 08", "Broken hip from a fall after surgery (alternate Measure ID: PSI_8_POST_HIP)",
  "PSI 09", "Postoperative hemorrhage or hematoma rate (alternate Measure ID: PSI_9_POST_HEM)",
  "PSI 10", "Kidney and diabetic complications after surgery (alternate Measure ID: PSI_10_POST_KIDNEY)",
  "PSI 11", "Respiratory failure after surgery (alternate Measure ID: PSI_11_POST_RESP)",
  "PSI 12", "Serious blood clots after surgery (alternate Measure ID: PSI-12-POSTOP-PULMEMB-DVT)",
  "PSI 13", "Blood stream infection after surgery (alternate Measure ID: PSI_13_POST_SEPSIS)",
  "PSI 14", "A wound that splits open after surgery on the abdomen or pelvis (alternate Measure ID: PSI-14-POSTOP-DEHIS)",
  "PSI 15", "Accidental cuts and tears from medical treatment (alternate Measure ID: PSI-15-ACC-LAC)",
  "MORT-30-AMI", "Death rate for heart attack patients",
  "MORT-30-CABG", "Death rate for Coronary Artery Bypass Graft (CABG) surgery patients",
  "MORT-30-COPD", "Death rate for chronic obstructive pulmonary disease (COPD) patients",
  "MORT-30-HF", "Death rate for heart failure patients",
  "MORT-30-PN", "Death rate for pneumonia patients",
  "MORT-30-STK", "Death rate for stroke patients"
)

dictDeaths %>% 
  kable(
    format = "html",
    caption = "Table 5. Measure IDs and Measure Names from Complications and Deaths") %>%
    kable_styling(bootstrap_options = c("hover", "full_width" = F))
Table 5. Measure IDs and Measure Names from Complications and Deaths
Measure ID Measure Name
COMP-HIP-KNEE Rate of complications for hip/knee replacement patients
PSI 90 Serious complications (this is a composite or summary measure; alternate Measure ID: PSI-90-SAFETY)
PSI 03 Pressure sores (alternate Measure ID: PSI_3_Ulcer)
PSI 04 Deaths among patients with serious treatable complications after surgery (alternate Measure ID: PSI-4-SURG-COMP)
PSI 06 Collapsed lung due to medical treatment (alternate Measure ID: PSI-6-IAT-PTX)
PSI 08 Broken hip from a fall after surgery (alternate Measure ID: PSI_8_POST_HIP)
PSI 09 Postoperative hemorrhage or hematoma rate (alternate Measure ID: PSI_9_POST_HEM)
PSI 10 Kidney and diabetic complications after surgery (alternate Measure ID: PSI_10_POST_KIDNEY)
PSI 11 Respiratory failure after surgery (alternate Measure ID: PSI_11_POST_RESP)
PSI 12 Serious blood clots after surgery (alternate Measure ID: PSI-12-POSTOP-PULMEMB-DVT)
PSI 13 Blood stream infection after surgery (alternate Measure ID: PSI_13_POST_SEPSIS)
PSI 14 A wound that splits open after surgery on the abdomen or pelvis (alternate Measure ID: PSI-14-POSTOP-DEHIS)
PSI 15 Accidental cuts and tears from medical treatment (alternate Measure ID: PSI-15-ACC-LAC)
MORT-30-AMI Death rate for heart attack patients
MORT-30-CABG Death rate for Coronary Artery Bypass Graft (CABG) surgery patients
MORT-30-COPD Death rate for chronic obstructive pulmonary disease (COPD) patients
MORT-30-HF Death rate for heart failure patients
MORT-30-PN Death rate for pneumonia patients
MORT-30-STK Death rate for stroke patients

Pivoting the data wider

deathsClean <- Complications_and_Deaths %>%
  pivot_wider(
    names_from = MeasureId, 
    values_from = c(ComparedToNational, Score), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(deathsClean)
## [1] 4814   41
head(deathsClean)
## # A tibble: 6 × 41
##   FacilityName    FacilityId State ComparedToNational_C…¹ ComparedToNational_M…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005     AL    No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006     AL    No Different Than the… Worse Than the Nation…
## 4 MIZELL MEMORIA… 010007     AL    Number of Cases Too S… Number of Cases Too S…
## 5 CRENSHAW COMMU… 010008     AL    <NA>                   Number of Cases Too S…
## 6 ST. VINCENT'S … 010011     AL    No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹​ComparedToNational_COMP_HIP_KNEE,
## #   ²​ComparedToNational_MORT_30_AMI
## # ℹ 36 more variables: ComparedToNational_MORT_30_CABG <chr>,
## #   ComparedToNational_MORT_30_COPD <chr>, ComparedToNational_MORT_30_HF <chr>,
## #   ComparedToNational_MORT_30_PN <chr>, ComparedToNational_MORT_30_STK <chr>,
## #   ComparedToNational_PSI_03 <chr>, ComparedToNational_PSI_04 <chr>,
## #   ComparedToNational_PSI_06 <chr>, ComparedToNational_PSI_08 <chr>, …

Exploring and Preprocessing the Payment_and_Value_of_Care dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of Payment_and_Value_of_Care
head(Payment_and_Value_of_Care,10)
## # A tibble: 10 × 22
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  6 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  7 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  8 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  9 010006     NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL    35630   LAUDERDALE  
## 10 010006     NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL    35630   LAUDERDALE  
## # ℹ 15 more variables: TelephoneNumber <chr>, PaymentMeasureId <chr>,
## #   PaymentMeasureName <chr>, PaymentCategory <chr>, Denominator <chr>,
## #   Payment <chr>, LowerEstimate <chr>, HigherEstimate <chr>,
## #   PaymentFootnote <dbl>, ValueOfCareDisplayId <chr>,
## #   ValueOfCareDisplayName <chr>, ValueOfCareCategory <chr>,
## #   ValueOfCareFootnote <dbl>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Payment_and_Value_of_Care %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
##     PaymentFootnote ValueOfCareFootnote 
##                9956               10044

Replacing NA values

# Replacing all "Not Applicable" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Creating a dictionary for Payment and Value of Care measure names

dictPayment <- tribble(
  ~`Measure ID`, ~`Measure Name`,
  "PAYM-30-AMI", "Payment for heart attack patients",
  "PAYM-30-HF", "Payment for heart failure patients",
  "PAYM-30-PN", "Payment for pneumonia patients",
  "PAYM_90_HIP_KNEE", "Payment for hip/knee replacement patients"
)

dictPayment %>% 
  kable(
    format = "html",
    caption = "Table 6. Measure IDs and Measure Names from Payment and Value of Care") %>%
    kable_styling(bootstrap_options = c("hover", "full_width" = F))
Table 6. Measure IDs and Measure Names from Payment and Value of Care
Measure ID Measure Name
PAYM-30-AMI Payment for heart attack patients
PAYM-30-HF Payment for heart failure patients
PAYM-30-PN Payment for pneumonia patients
PAYM_90_HIP_KNEE Payment for hip/knee replacement patients

Pivoting the data wider

paymentClean <- Payment_and_Value_of_Care %>%
  pivot_wider(
    names_from = PaymentMeasureId, 
    values_from = c(PaymentCategory, Payment), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(paymentClean)
## [1] 4645   11
head(paymentClean)
## # A tibble: 6 × 11
##   FacilityName    FacilityId State PaymentCategory_PAYM…¹ PaymentCategory_PAYM…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005     AL    No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006     AL    Greater Than the Nati… No Different Than the…
## 4 MIZELL MEMORIA… 010007     AL    Number of Cases Too S… No Different Than the…
## 5 CRENSHAW COMMU… 010008     AL    Number of Cases Too S… Number of Cases Too S…
## 6 ST. VINCENT'S … 010011     AL    No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹​PaymentCategory_PAYM_30_AMI,
## #   ²​PaymentCategory_PAYM_30_HF
## # ℹ 6 more variables: PaymentCategory_PAYM_30_PN <chr>,
## #   PaymentCategory_PAYM_90_HIP_KNEE <chr>, Payment_PAYM_30_AMI <chr>,
## #   Payment_PAYM_30_HF <chr>, Payment_PAYM_30_PN <chr>,
## #   Payment_PAYM_90_HIP_KNEE <chr>

Joining and cleaning the datasets (AC)

Joining the datasets based on FacilityId

HipKneeClean <- readmissionsClean %>%
  full_join(HCAHPSClean, by = "FacilityId") %>%
  full_join(careClean, by = "FacilityId") %>%
  full_join(deathsClean, by = "FacilityId") %>%
  full_join(paymentClean, by = "FacilityId")

head(HipKneeClean)
## # A tibble: 6 × 451
##   FacilityName.x                  FacilityId State.x NumberOfDischarges_HIP-KN…¹
##   <chr>                           <chr>      <chr>                         <dbl>
## 1 SOUTHEAST HEALTH MEDICAL CENTER 010001     AL                               NA
## 2 MARSHALL MEDICAL CENTERS        010005     AL                               NA
## 3 NORTH ALABAMA MEDICAL CENTER    010006     AL                               NA
## 4 MIZELL MEMORIAL HOSPITAL        010007     AL                               NA
## 5 CRENSHAW COMMUNITY HOSPITAL     010008     AL                               NA
## 6 ST. VINCENT'S EAST              010011     AL                               NA
## # ℹ abbreviated name: ¹​`NumberOfDischarges_HIP-KNEE`
## # ℹ 447 more variables: `ExcessReadmissionRatio_HIP-KNEE` <dbl>,
## #   `PredictedReadmissionRate_HIP-KNEE` <dbl>,
## #   `ExpectedReadmissionRate_HIP-KNEE` <dbl>,
## #   `NumberOfReadmissions_HIP-KNEE` <dbl>, FacilityName.y <chr>, State.y <chr>,
## #   PatientSurveyStarRating_H_COMP_1_A_P <chr>,
## #   PatientSurveyStarRating_H_COMP_1_SN_P <chr>, …

Removing redundant columns

# Removing duplicate columns
HipKneeClean <- HipKneeClean %>%
  select(-matches("\\.(x|y|z|w|v)$"))

Checking for NA Values

# Checking the dimensions
dim(HipKneeClean)

# Count NA values in each column
na_counts <- sapply(HipKneeClean, function(x) sum(is.na(x)))

# View the NA counts
print(na_counts)

Removing columns with more than 80% NA values

# Calculate the percentage of NA values for each column
na_percentage <- sapply(HipKneeClean, function(x) mean(is.na(x)))

# Remove columns where more than 80% of the values are NA
HipKneeClean <- HipKneeClean[, na_percentage <= 0.8]

# Count NA values in each column
na_counts <- sapply(HipKneeClean, function(x) sum(is.na(x)))

# View the NA counts
print(na_counts)

# Check the dimensions
dim(HipKneeClean)

Removing answer percent and survey response percent columns

# Remove columns containing 'AnswerPercent' or 'SurveyResponseRate'
HipKneeClean <- HipKneeClean %>%
  select(-matches("AnswerPercent|SurveyResponseRate"))

# Check the dimensions
dim(HipKneeClean)
## [1] 4816   87

Removing compared to national columns

# Remove columns containing 'ComparedToNational' and 'PaymentCategory'
HipKneeClean <- HipKneeClean %>%
  select(-matches("ComparedToNational|PaymentCategory"))

# Check the dimensions
dim(HipKneeClean)
## [1] 4816   67

Checking data structure

str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : chr [1:4816] "3" "3" "2" "3" ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : chr [1:4816] "4" "4" "3" "5" ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : chr [1:4816] "3" "2" "2" "4" ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : chr [1:4816] "3" "3" "2" "3" ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : chr [1:4816] "4" "3" "3" "4" ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : chr [1:4816] "3" "2" "1" "2" ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : chr [1:4816] "4" "4" "4" "4" ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : chr [1:4816] "4" "3" "2" "4" ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : chr [1:4816] "89" "90" "88" "91" ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : chr [1:4816] "91" "92" "89" "95" ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : chr [1:4816] "81" "75" "75" "88" ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : chr [1:4816] "77" "76" "71" "77" ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : chr [1:4816] "87" "86" "83" "87" ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : chr [1:4816] "82" "79" "77" "82" ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : chr [1:4816] "84" "80" "74" "80" ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : chr [1:4816] "86" "85" "85" "87" ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : chr [1:4816] "89" "85" "82" "89" ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : chr [1:4816] "90" "83" "79" "88" ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : chr [1:4816] NA "148" NA NA ...
##  $ HCP_COVID_19                                    : chr [1:4816] "80.7" "79.8" "79" "57.9" ...
##  $ IMM_3                                           : chr [1:4816] "95" "80" "67" "53" ...
##  $ OP_18b                                          : chr [1:4816] "215" "147" "177" "130" ...
##  $ OP_18c                                          : chr [1:4816] "317" "266" NA "216" ...
##  $ OP_22                                           : chr [1:4816] "5" "3" "1" "4" ...
##  $ OP_23                                           : chr [1:4816] NA NA "69" NA ...
##  $ OP_29                                           : chr [1:4816] "47" "96" "85" "23" ...
##  $ SAFE_USE_OF_OPIOIDS                             : chr [1:4816] "14" "19" "17" NA ...
##  $ SEP_1                                           : chr [1:4816] "66" "74" "56" "86" ...
##  $ SEP_SH_3HR                                      : chr [1:4816] "70" "88" "77" NA ...
##  $ SEP_SH_6HR                                      : chr [1:4816] "100" "91" "81" NA ...
##  $ SEV_SEP_3HR                                     : chr [1:4816] "79" "88" "78" "89" ...
##  $ SEV_SEP_6HR                                     : chr [1:4816] "95" "96" "86" "97" ...
##  $ STK_02                                          : chr [1:4816] "98" "100" "96" NA ...
##  $ STK_05                                          : chr [1:4816] NA "91" NA NA ...
##  $ STK_06                                          : chr [1:4816] NA NA "97" NA ...
##  $ VTE_1                                           : chr [1:4816] "98" NA NA NA ...
##  $ VTE_2                                           : chr [1:4816] "99" NA "97" NA ...
##  $ Score_COMP_HIP_KNEE                             : chr [1:4816] "2.7" "2.3" "4.6" NA ...
##  $ Score_MORT_30_AMI                               : chr [1:4816] "12" "13.6" "16.5" NA ...
##  $ Score_MORT_30_COPD                              : chr [1:4816] "8.8" "9.9" "9.9" "13.7" ...
##  $ Score_MORT_30_HF                                : chr [1:4816] "8.9" "14.9" "12.5" "12.5" ...
##  $ Score_MORT_30_PN                                : chr [1:4816] "18" "23.3" "19.5" "28.5" ...
##  $ Score_MORT_30_STK                               : chr [1:4816] "14.8" "15.3" "17.2" NA ...
##  $ Score_PSI_03                                    : chr [1:4816] "0.39" "0.94" "1.39" "0.42" ...
##  $ Score_PSI_04                                    : chr [1:4816] "184.68" "183.49" "173.63" NA ...
##  $ Score_PSI_06                                    : chr [1:4816] "0.23" "0.22" "0.36" "0.24" ...
##  $ Score_PSI_08                                    : chr [1:4816] "0.10" "0.09" "0.08" "0.09" ...
##  $ Score_PSI_09                                    : chr [1:4816] "2.39" "2.69" "5.43" "2.49" ...
##  $ Score_PSI_10                                    : chr [1:4816] "1.14" "1.37" "1.26" "1.57" ...
##  $ Score_PSI_11                                    : chr [1:4816] "13.83" "7.19" "7.37" "8.45" ...
##  $ Score_PSI_12                                    : chr [1:4816] "4.49" "3.01" "3.36" "3.89" ...
##  $ Score_PSI_13                                    : chr [1:4816] "8.05" "4.46" "4.37" "5.19" ...
##  $ Score_PSI_14                                    : chr [1:4816] "1.69" "1.87" "1.76" NA ...
##  $ Score_PSI_15                                    : chr [1:4816] "0.93" "0.91" "1.34" "1.08" ...
##  $ Score_PSI_90                                    : chr [1:4816] "1.21" "0.97" "1.17" "0.95" ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Convert columns to numeric
HipKneeClean <- HipKneeClean %>%
  mutate_at(vars(starts_with("PatientSurveyStarRating_"), 
                 starts_with("HcahpsLinearMeanValue_"), 
                 starts_with("Score_"),
                 starts_with("ED_"),
                 starts_with("IMM_"),
                 starts_with("OP_"),
                 starts_with("SEP_"),
                 starts_with("SEV_"),
                 starts_with("STK_"),
                 starts_with("VTE_"),
                 starts_with("SAFE_"),
                 starts_with("HCP_")),
            ~ as.numeric(as.character(.)))

# View the structure
str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
##  $ HCP_COVID_19                                    : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
##  $ IMM_3                                           : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
##  $ OP_18b                                          : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
##  $ OP_18c                                          : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
##  $ OP_22                                           : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
##  $ OP_23                                           : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
##  $ OP_29                                           : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
##  $ SAFE_USE_OF_OPIOIDS                             : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
##  $ SEP_1                                           : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
##  $ SEP_SH_3HR                                      : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
##  $ SEP_SH_6HR                                      : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
##  $ SEV_SEP_3HR                                     : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
##  $ SEV_SEP_6HR                                     : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
##  $ STK_02                                          : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
##  $ STK_05                                          : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
##  $ STK_06                                          : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
##  $ VTE_1                                           : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
##  $ VTE_2                                           : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
##  $ Score_COMP_HIP_KNEE                             : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
##  $ Score_MORT_30_AMI                               : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
##  $ Score_MORT_30_COPD                              : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
##  $ Score_MORT_30_HF                                : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
##  $ Score_MORT_30_PN                                : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
##  $ Score_MORT_30_STK                               : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
##  $ Score_PSI_03                                    : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
##  $ Score_PSI_04                                    : num [1:4816] 185 183 174 NA NA ...
##  $ Score_PSI_06                                    : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
##  $ Score_PSI_08                                    : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
##  $ Score_PSI_09                                    : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
##  $ Score_PSI_10                                    : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
##  $ Score_PSI_11                                    : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
##  $ Score_PSI_12                                    : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
##  $ Score_PSI_13                                    : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
##  $ Score_PSI_14                                    : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
##  $ Score_PSI_15                                    : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
##  $ Score_PSI_90                                    : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...

Fixing the payment column

# Remove $ and , and convert to numeric
HipKneeClean <- HipKneeClean %>%
  mutate_at(vars(starts_with("Payment_")), 
            ~ as.numeric(gsub("[\\$,]", "", .)))

# Checking the structure
str(HipKneeClean)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 3 10 5 NA 6 10 9 NA 9 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
##  $ HCP_COVID_19                                    : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
##  $ IMM_3                                           : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
##  $ OP_18b                                          : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
##  $ OP_18c                                          : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
##  $ OP_22                                           : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
##  $ OP_23                                           : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
##  $ OP_29                                           : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
##  $ SAFE_USE_OF_OPIOIDS                             : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
##  $ SEP_1                                           : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
##  $ SEP_SH_3HR                                      : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
##  $ SEP_SH_6HR                                      : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
##  $ SEV_SEP_3HR                                     : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
##  $ SEV_SEP_6HR                                     : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
##  $ STK_02                                          : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
##  $ STK_05                                          : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
##  $ STK_06                                          : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
##  $ VTE_1                                           : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
##  $ VTE_2                                           : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
##  $ Score_COMP_HIP_KNEE                             : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
##  $ Score_MORT_30_AMI                               : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
##  $ Score_MORT_30_COPD                              : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
##  $ Score_MORT_30_HF                                : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
##  $ Score_MORT_30_PN                                : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
##  $ Score_MORT_30_STK                               : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
##  $ Score_PSI_03                                    : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
##  $ Score_PSI_04                                    : num [1:4816] 185 183 174 NA NA ...
##  $ Score_PSI_06                                    : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
##  $ Score_PSI_08                                    : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
##  $ Score_PSI_09                                    : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
##  $ Score_PSI_10                                    : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
##  $ Score_PSI_11                                    : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
##  $ Score_PSI_12                                    : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
##  $ Score_PSI_13                                    : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
##  $ Score_PSI_14                                    : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
##  $ Score_PSI_15                                    : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
##  $ Score_PSI_90                                    : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : num [1:4816] 22212 18030 21898 NA NA ...

Saving the data to use without having to clean it

save(HipKneeClean, file = "HipKneeClean.RData")

Exploring the cleaned data

Generating visualizations (SE)

Creating a summary table for numeric variables

# Select numeric columns
numeric_columns <- select_if(HipKneeClean, is.numeric)

# Calculate descriptive statistics
descr_stats <- psych::describe(numeric_columns)

# Convert to a data frame
descr_stats_df <- as.data.frame(descr_stats)

# Display the table
kable(descr_stats_df, format = "html", caption = "Table 6. Descriptive Statistics for Numeric Variables in Cleaned Dataset") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Table 6. Descriptive Statistics for Numeric Variables in Cleaned Dataset
vars n mean sd median trimmed mad min max range skew kurtosis se
ExcessReadmissionRatio_HIP-KNEE 1 1838 1.004161e+00 0.1263979 0.9921 1.000079e+00 0.1119363 0.6159 1.5162 0.9003 0.3862731 0.7447069 0.0029483
PredictedReadmissionRate_HIP-KNEE 2 1838 4.546552e+00 0.9092848 4.4768 4.511130e+00 0.8590184 1.9279 8.5690 6.6411 0.4370579 0.4866461 0.0212093
ExpectedReadmissionRate_HIP-KNEE 3 1838 4.519903e+00 0.6637697 4.4544 4.484779e+00 0.6165392 2.6749 7.6240 4.9491 0.6361300 1.0010780 0.0154826
NumberOfReadmissions_HIP-KNEE 4 1838 8.098477e+00 7.8400996 7.0000 6.813859e+00 4.4478000 1.0000 125.0000 124.0000 4.5178495 40.4373466 0.1828727
PatientSurveyStarRating_H_COMP_1_STAR_RATING 5 3255 3.260215e+00 1.0059133 3.0000 3.241843e+00 1.4826000 1.0000 5.0000 4.0000 0.0239346 -0.4825494 0.0176313
PatientSurveyStarRating_H_COMP_2_STAR_RATING 6 3255 3.428264e+00 0.9474515 3.0000 3.450672e+00 1.4826000 1.0000 5.0000 4.0000 -0.3484208 -0.0771131 0.0166066
PatientSurveyStarRating_H_COMP_3_STAR_RATING 7 3255 3.372350e+00 1.0909348 4.0000 3.388100e+00 1.4826000 1.0000 5.0000 4.0000 -0.2839572 -0.8381418 0.0191216
PatientSurveyStarRating_H_COMP_5_STAR_RATING 8 3255 3.064516e+00 0.9126664 3.0000 3.062572e+00 1.4826000 1.0000 5.0000 4.0000 -0.0135291 -0.3800413 0.0159969
PatientSurveyStarRating_H_COMP_6_STAR_RATING 9 3255 3.388940e+00 0.9148777 3.0000 3.401919e+00 1.4826000 1.0000 5.0000 4.0000 -0.3167335 0.0370958 0.0160357
PatientSurveyStarRating_H_COMP_7_STAR_RATING 10 3255 3.167742e+00 0.9963691 3.0000 3.144338e+00 1.4826000 1.0000 5.0000 4.0000 -0.0232912 -0.4736871 0.0174640
PatientSurveyStarRating_H_CLEAN_STAR_RATING 11 3255 3.049770e+00 1.1197420 3.0000 3.063724e+00 1.4826000 1.0000 5.0000 4.0000 -0.1031912 -0.6984520 0.0196265
PatientSurveyStarRating_H_QUIET_STAR_RATING 12 3255 3.214132e+00 1.1166932 3.0000 3.228791e+00 1.4826000 1.0000 5.0000 4.0000 -0.1265578 -0.6910097 0.0195730
PatientSurveyStarRating_H_HSP_RATING_STAR_RATING 13 3255 3.243318e+00 0.9195166 3.0000 3.268330e+00 1.4826000 1.0000 5.0000 4.0000 -0.2786674 0.0381366 0.0161170
PatientSurveyStarRating_H_RECMND_STAR_RATING 14 3255 3.497696e+00 1.0287408 4.0000 3.554702e+00 1.4826000 1.0000 5.0000 4.0000 -0.6160258 -0.1587475 0.0180314
PatientSurveyStarRating_H_STAR_RATING 15 3255 3.295545e+00 0.9142197 3.0000 3.294818e+00 1.4826000 1.0000 5.0000 4.0000 -0.1485226 -0.2457959 0.0160242
HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE 16 3255 9.049002e+01 2.9117012 91.0000 9.063608e+01 2.9652000 77.0000 100.0000 23.0000 -0.6770638 1.4141442 0.0510354
HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE 17 3255 9.028879e+01 2.8233453 90.0000 9.039079e+01 2.9652000 76.0000 100.0000 24.0000 -0.4886223 1.1133723 0.0494867
HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE 18 3255 8.276897e+01 5.3062993 83.0000 8.280499e+01 4.4478000 61.0000 100.0000 39.0000 -0.1793486 0.4429645 0.0930071
HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE 19 3255 7.554593e+01 5.1600947 75.0000 7.547178e+01 4.4478000 51.0000 99.0000 48.0000 0.0989232 0.6200355 0.0904445
HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE 20 3255 8.568786e+01 4.1729980 86.0000 8.596238e+01 2.9652000 59.0000 100.0000 41.0000 -0.9067227 2.2178795 0.0731430
HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE 21 3255 8.040277e+01 3.2911839 81.0000 8.046795e+01 2.9652000 64.0000 97.0000 33.0000 -0.2792695 1.0404404 0.0576868
HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE 22 3255 8.560799e+01 4.7038213 86.0000 8.575931e+01 4.4478000 68.0000 99.0000 31.0000 -0.3531253 0.1936870 0.0824471
HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE 23 3255 8.172657e+01 5.6927220 82.0000 8.193282e+01 5.9304000 56.0000 99.0000 43.0000 -0.3843168 0.2536999 0.0997802
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 24 3255 8.709708e+01 4.0649053 88.0000 8.730940e+01 2.9652000 65.0000 98.0000 33.0000 -0.6979236 1.3325347 0.0712484
HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE 25 3255 8.607343e+01 5.2109599 87.0000 8.638349e+01 4.4478000 57.0000 99.0000 42.0000 -0.7407627 1.2385658 0.0913361
ED_2_Strata_1 26 1107 1.064002e+02 114.0608548 74.0000 8.602368e+01 50.4084000 0.0000 1078.0000 1078.0000 3.9796326 22.8953094 3.4281736
HCP_COVID_19 27 3633 8.767556e+01 10.6218376 90.1000 8.903068e+01 9.4886400 0.5000 100.0000 99.5000 -1.4380632 3.5511860 0.1762248
IMM_3 28 4140 7.782681e+01 18.5753061 83.0000 8.024245e+01 17.7912000 0.0000 100.0000 100.0000 -1.0622115 0.7180171 0.2886927
OP_18b 29 4067 1.617780e+02 54.6367977 153.0000 1.572833e+02 53.3736000 38.0000 587.0000 549.0000 0.9959452 2.0284358 0.8567382
OP_18c 30 3098 2.967434e+02 177.2966416 255.0000 2.700827e+02 117.1254000 40.0000 2954.0000 2914.0000 3.5082626 28.6044027 3.1853694
OP_22 31 3841 2.385056e+00 2.3270098 2.0000 2.028637e+00 1.4826000 0.0000 19.0000 19.0000 1.7797727 4.6722341 0.0375471
OP_23 32 1535 7.062801e+01 19.2269197 74.0000 7.247193e+01 17.7912000 0.0000 100.0000 100.0000 -0.9377173 0.8863709 0.4907446
OP_29 33 2830 9.125230e+01 14.2952983 96.0000 9.458083e+01 5.9304000 0.0000 100.0000 100.0000 -3.1390450 11.9252917 0.2687200
SAFE_USE_OF_OPIOIDS 34 3670 1.561226e+01 5.6808277 15.0000 1.537568e+01 4.4478000 0.0000 45.0000 45.0000 0.6530895 2.0207900 0.0937732
SEP_1 35 3097 5.982661e+01 16.7144073 61.0000 6.045180e+01 16.3086000 0.0000 100.0000 100.0000 -0.4029034 0.1050294 0.3003450
SEP_SH_3HR 36 2620 6.724809e+01 17.8935243 68.0000 6.776813e+01 19.2738000 0.0000 100.0000 100.0000 -0.2914787 -0.2730680 0.3495789
SEP_SH_6HR 37 2039 8.305983e+01 15.4244924 87.0000 8.529455e+01 11.8608000 7.0000 100.0000 93.0000 -1.5023285 2.7030480 0.3415877
SEV_SEP_3HR 38 3086 7.904342e+01 11.1773414 81.0000 7.998907e+01 10.3782000 0.0000 100.0000 100.0000 -1.4060736 4.9567276 0.2012058
SEV_SEP_6HR 39 2937 8.871263e+01 11.3390205 92.0000 9.061974e+01 7.4130000 0.0000 100.0000 100.0000 -2.4051707 8.9993568 0.2092298
STK_02 40 1537 9.529733e+01 6.1635225 97.0000 9.638262e+01 2.9652000 23.0000 100.0000 77.0000 -4.7304997 35.8413538 0.1572143
STK_05 41 1094 9.278702e+01 7.1402994 94.0000 9.367352e+01 4.4478000 2.0000 100.0000 98.0000 -5.4512524 54.5350543 0.2158777
STK_06 42 1298 9.464946e+01 7.5064354 96.0000 9.581154e+01 2.9652000 0.0000 100.0000 100.0000 -7.3728956 77.6048723 0.2083514
VTE_1 43 2216 8.246435e+01 19.1503872 89.0000 8.603777e+01 11.8608000 0.0000 100.0000 100.0000 -1.7514646 3.0387995 0.4068110
VTE_2 44 1413 9.383015e+01 9.7362977 97.0000 9.588241e+01 2.9652000 3.0000 100.0000 97.0000 -4.1429487 23.6985405 0.2590137
Score_COMP_HIP_KNEE 45 2090 3.182392e+00 0.5482694 3.1000 3.150419e+00 0.4447800 1.6000 6.2000 4.6000 0.7716603 1.9037431 0.0119928
Score_MORT_30_AMI 46 1943 1.254359e+01 1.1553168 12.5000 1.251608e+01 1.0378200 8.9000 17.1000 8.2000 0.2785565 0.5897728 0.0262099
Score_MORT_30_COPD 47 2569 9.185286e+00 1.3614554 9.1000 9.121196e+00 1.3343400 5.2000 14.9000 9.7000 0.5044944 0.5326934 0.0268609
Score_MORT_30_HF 48 3056 1.182863e+01 1.9384358 11.8000 1.180581e+01 1.7791200 5.5000 20.4000 14.9000 0.1359787 0.4028740 0.0350651
Score_MORT_30_PN 49 3514 1.833056e+01 2.5441335 18.2000 1.826543e+01 2.3721600 8.6000 29.5000 20.9000 0.3130182 0.5748975 0.0429180
Score_MORT_30_STK 50 2123 1.379157e+01 1.8194129 13.7000 1.371648e+01 1.7791200 8.0000 21.9000 13.9000 0.4400162 0.5676934 0.0394872
Score_PSI_03 51 3169 5.805491e-01 0.4702323 0.4800 5.037288e-01 0.2372160 0.0500 6.3100 6.2600 4.0520349 30.4735061 0.0083532
Score_PSI_04 52 1609 1.687290e+02 21.3153769 167.7400 1.687267e+02 20.2523160 86.6800 241.8100 155.1300 -0.0315953 0.4882789 0.5313920
Score_PSI_06 53 3188 2.476851e-01 0.0402023 0.2400 2.442712e-01 0.0296520 0.1200 0.5100 0.3900 1.1937679 3.4906535 0.0007120
Score_PSI_08 54 3189 9.043270e-02 0.0070889 0.0900 9.019980e-02 0.0000000 0.0600 0.1300 0.0700 0.5878153 2.8393103 0.0001255
Score_PSI_09 55 2930 2.508707e+00 0.4395922 2.4600 2.478486e+00 0.2668680 1.1000 6.1000 5.0000 1.3622221 5.9305415 0.0081211
Score_PSI_10 56 2593 1.569626e+00 0.3418816 1.5300 1.535055e+00 0.1186080 0.4700 4.5500 4.0800 1.9801686 8.7853292 0.0067139
Score_PSI_11 57 2603 9.045517e+00 3.2148329 8.3900 8.740322e+00 2.1201180 2.7300 66.8500 64.1200 4.3362544 54.8289666 0.0630117
Score_PSI_12 58 2935 3.597278e+00 0.7194093 3.5000 3.542005e+00 0.5633880 1.6100 7.5100 5.9000 1.0157663 2.2968831 0.0132792
Score_PSI_13 59 2549 5.298133e+00 0.9887454 5.1300 5.224669e+00 0.7116480 2.1700 13.4900 11.3200 1.1662305 4.3263395 0.0195839
Score_PSI_14 60 2592 2.010590e+00 0.3338405 1.9400 1.969769e+00 0.1482600 0.8900 4.4000 3.5100 1.9779060 7.1675818 0.0065572
Score_PSI_15 61 2916 1.101708e+00 0.2939729 1.0500 1.067549e+00 0.1630860 0.3500 3.4300 3.0800 1.8219347 6.3874487 0.0054439
Score_PSI_90 62 3011 1.001588e+00 0.1793301 0.9700 9.839477e-01 0.1186080 0.5500 2.7400 2.1900 2.0610890 10.6309961 0.0032681
Payment_PAYM_90_HIP_KNEE 63 2001 2.105813e+04 2079.2072318 20899.0000 2.093031e+04 1756.8810000 15936.0000 48153.0000 32217.0000 1.7757439 15.5385065 46.4808683

Exploring categorical variables

# Visualizing the distribution of EDV (Emergency Department Volume)
ggplot(HipKneeClean, aes(x = EDV)) +
  geom_bar(fill = "skyblue", color = "black", alpha = 0.7) +
  labs(title = "Figure 1. Distribution of Emergency Department Volume",
       x = "EDV",
       y = "Count") +
  theme_minimal() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())

Visualizing the number of facilities per state

# Data preparation
facility_counts <- HipKneeClean %>%
  group_by(State) %>%
  summarise(Count = n(), .groups = 'drop')

# Check the first few rows
head(facility_counts)
## # A tibble: 6 × 2
##   State Count
##   <chr> <int>
## 1 AK       21
## 2 AL       88
## 3 AR       79
## 4 AS        1
## 5 AZ       82
## 6 CA      327
# Get state boundaries
states_map <- map_data("state")

# Create a mapping from state abbreviations to full state names
state_mapping <- data.frame(
  State = state.abb,
  full_state_name = tolower(state.name),
  stringsAsFactors = FALSE
)

# Add full state names to facility_counts
facility_counts <- merge(facility_counts, state_mapping, by.x = "State", by.y = "State")

# Join facility counts with state map data
facility_map_data <- left_join(states_map, facility_counts, by = c("region" = "full_state_name"))

# Replace NA values with 0 in the Count column
facility_map_data$Count[is.na(facility_map_data$Count)] <- 0

# Plot the map with facility counts
ggplot(data = facility_map_data) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = Count), color = "white") +
  scale_fill_gradient(low = "lightblue", high = "darkblue", na.value = "grey50", name = "Facility Count") +
  theme_minimal() +
  labs(title = "Figure 2. Number of Facilities per State") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        plot.background = element_blank())

Visualizing the average PredictedReadmissionRate_HIP-KNEE per state

# Rename column
HipKneeClean <- HipKneeClean %>%
  rename(PredictedReadmissionRate_HIP_KNEE = `PredictedReadmissionRate_HIP-KNEE`)

# Calculate the average PredictedReadmissionRate_HIP-KNEE per state
average_readmission_rate <- HipKneeClean %>%
  group_by(State) %>%
  summarize(Average_PredictedReadmissionRate_HIP_KNEE = mean(PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))

# Add full state names to the average readmission rate data
average_readmission_rate <- merge(average_readmission_rate, state_mapping, by.x = "State", by.y = "State")

# Join average readmission rate with state map data
readmission_map_data <- left_join(states_map, average_readmission_rate, by = c("region" = "full_state_name"))

# Plot the map with average readmission rates
ggplot(data = readmission_map_data) +
  geom_polygon(aes(x = long, y = lat, group = group, fill = Average_PredictedReadmissionRate_HIP_KNEE), color = "white") +
  scale_fill_gradient(low = "lightgreen", high = "darkgreen", name = "Average Predicted\nReadmission Rate") +
  theme_minimal() +
  labs(title = "Figure 3. Average Predicted Readmission Rate for Hip/Knee Replacement per State") +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank(),
        plot.background = element_blank())

Visualizing the spread of the target variable (PredictedReadmissionRate_HIP_KNEE)

# Create a histogram of PredictedReadmissionRate_HIP_KNEE
ggplot(HipKneeClean, aes(x = PredictedReadmissionRate_HIP_KNEE)) +
  geom_histogram(binwidth = 0.25, fill = "skyblue", color = "black") +
  labs(title = "Figure 4. Histogram of Predicted Readmission Rate for Hip/Knee Replacement",
       x = "Predicted Readmission Rate",
       y = "Frequency") +
  theme_minimal() + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank())
## Warning: Removed 2978 rows containing non-finite values (`stat_bin()`).

Creating a table of missing values

# Calculate missing values
missing_values_summary <- HipKneeClean %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeClean)) * 100)

# Print the table using kable
missing_values_summary %>%
  kable(caption = "Table 7. Missing Values Summary") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 7. Missing Values Summary
Variable Missing_Count Missing_Percentage
FacilityId 0 0.000000
ExcessReadmissionRatio_HIP-KNEE 2978 61.835548
PredictedReadmissionRate_HIP_KNEE 2978 61.835548
ExpectedReadmissionRate_HIP-KNEE 2978 61.835548
NumberOfReadmissions_HIP-KNEE 2978 61.835548
PatientSurveyStarRating_H_COMP_1_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_COMP_2_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_COMP_3_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_COMP_5_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_COMP_6_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_COMP_7_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_CLEAN_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_QUIET_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_HSP_RATING_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_RECMND_STAR_RATING 1561 32.412791
PatientSurveyStarRating_H_STAR_RATING 1561 32.412791
HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 1561 32.412791
HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE 1561 32.412791
EDV 972 20.182724
ED_2_Strata_1 3709 77.014120
HCP_COVID_19 1183 24.563954
IMM_3 676 14.036545
OP_18b 749 15.552326
OP_18c 1718 35.672758
OP_22 975 20.245017
OP_23 3281 68.127076
OP_29 1986 41.237541
SAFE_USE_OF_OPIOIDS 1146 23.795681
SEP_1 1719 35.693522
SEP_SH_3HR 2196 45.598007
SEP_SH_6HR 2777 57.661960
SEV_SEP_3HR 1730 35.921927
SEV_SEP_6HR 1879 39.015781
STK_02 3279 68.085548
STK_05 3722 77.284053
STK_06 3518 73.048173
VTE_1 2600 53.986711
VTE_2 3403 70.660299
Score_COMP_HIP_KNEE 2726 56.602990
Score_MORT_30_AMI 2873 59.655316
Score_MORT_30_COPD 2247 46.656977
Score_MORT_30_HF 1760 36.544851
Score_MORT_30_PN 1302 27.034884
Score_MORT_30_STK 2693 55.917774
Score_PSI_03 1647 34.198505
Score_PSI_04 3207 66.590532
Score_PSI_06 1628 33.803987
Score_PSI_08 1627 33.783223
Score_PSI_09 1886 39.161130
Score_PSI_10 2223 46.158638
Score_PSI_11 2213 45.950997
Score_PSI_12 1881 39.057309
Score_PSI_13 2267 47.072259
Score_PSI_14 2224 46.179402
Score_PSI_15 1900 39.451827
Score_PSI_90 1805 37.479236
FacilityName 171 3.550664
State 171 3.550664
Payment_PAYM_90_HIP_KNEE 2815 58.450997

Assessing collinearity

# Compute correlation matrix
cor_matrix <- cor(HipKneeClean %>% select_if(is.numeric), use = "pairwise.complete.obs")

# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)

# Plot the heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")

# Convert the correlation matrix to a data frame
cor_table <- as.data.frame(cor_matrix)

# Add variable names as a column for better readability
cor_table$Variable <- rownames(cor_table)

# Reorder columns for better readability
cor_table <- cor_table %>%
  select(Variable, everything())

# Print the table using kable
cor_table %>%
  kable(caption = "Table 8. Correlation Coefficients Table") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 8. Correlation Coefficients Table
Variable ExcessReadmissionRatio_HIP-KNEE PredictedReadmissionRate_HIP_KNEE ExpectedReadmissionRate_HIP-KNEE NumberOfReadmissions_HIP-KNEE PatientSurveyStarRating_H_COMP_1_STAR_RATING PatientSurveyStarRating_H_COMP_2_STAR_RATING PatientSurveyStarRating_H_COMP_3_STAR_RATING PatientSurveyStarRating_H_COMP_5_STAR_RATING PatientSurveyStarRating_H_COMP_6_STAR_RATING PatientSurveyStarRating_H_COMP_7_STAR_RATING PatientSurveyStarRating_H_CLEAN_STAR_RATING PatientSurveyStarRating_H_QUIET_STAR_RATING PatientSurveyStarRating_H_HSP_RATING_STAR_RATING PatientSurveyStarRating_H_RECMND_STAR_RATING PatientSurveyStarRating_H_STAR_RATING HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE ED_2_Strata_1 HCP_COVID_19 IMM_3 OP_18b OP_18c OP_22 OP_23 OP_29 SAFE_USE_OF_OPIOIDS SEP_1 SEP_SH_3HR SEP_SH_6HR SEV_SEP_3HR SEV_SEP_6HR STK_02 STK_05 STK_06 VTE_1 VTE_2 Score_COMP_HIP_KNEE Score_MORT_30_AMI Score_MORT_30_COPD Score_MORT_30_HF Score_MORT_30_PN Score_MORT_30_STK Score_PSI_03 Score_PSI_04 Score_PSI_06 Score_PSI_08 Score_PSI_09 Score_PSI_10 Score_PSI_11 Score_PSI_12 Score_PSI_13 Score_PSI_14 Score_PSI_15 Score_PSI_90 Payment_PAYM_90_HIP_KNEE
ExcessReadmissionRatio_HIP-KNEE ExcessReadmissionRatio_HIP-KNEE 1.0000000 0.6851738 0.0934639 0.0280292 -0.1590972 -0.1749847 -0.1597408 -0.1551935 -0.1771868 -0.1941803 -0.1160735 -0.1047116 -0.1709175 -0.1659007 -0.1783759 -0.1666746 -0.1802262 -0.1831908 -0.1775098 -0.1844376 -0.1877798 -0.1223047 -0.1123794 -0.1779322 -0.1852873 0.0762327 -0.0508152 -0.0583326 0.0505288 0.0838395 0.0252830 0.0485500 -0.0296026 0.0627005 -0.0309171 -0.0529663 0.0313734 -0.0322822 -0.0278551 -0.0434539 -0.0205857 -0.0425563 0.0039309 0.0707936 0.4350513 0.0392653 -0.0452304 -0.0577013 -0.0046351 -0.0025521 -0.0058883 0.0129772 0.0156305 0.0386634 -0.0459321 0.0099239 0.1103031 0.0939499 0.1262199 -0.0138581 -0.0019014 0.0882354 0.2740999
PredictedReadmissionRate_HIP_KNEE PredictedReadmissionRate_HIP_KNEE 0.6851738 1.0000000 0.7840403 -0.0298799 -0.2144148 -0.2287902 -0.2138329 -0.2191170 -0.2029456 -0.2247172 -0.1981594 -0.1748506 -0.2016141 -0.1642097 -0.2308937 -0.2067264 -0.2352248 -0.2449572 -0.2546502 -0.2079982 -0.2250397 -0.2045779 -0.1801665 -0.2060912 -0.1911254 0.1082272 -0.0563082 -0.0028840 0.1295727 0.0877808 0.0501727 0.0614226 -0.0106510 0.1063002 -0.0326387 -0.0689850 0.0231777 -0.0375511 -0.0014846 0.0066866 -0.0801055 -0.0101158 0.0654668 0.1064994 0.3208550 0.0074065 -0.0794948 -0.1067828 -0.0985660 -0.0376746 -0.0037334 -0.0449077 0.0154891 -0.0214412 -0.0182303 0.0710046 0.1130121 0.1047402 0.1193336 0.0140012 -0.0158282 0.0973882 0.2975679
ExpectedReadmissionRate_HIP-KNEE ExpectedReadmissionRate_HIP-KNEE 0.0934639 0.7840403 1.0000000 -0.0742574 -0.1696704 -0.1755397 -0.1644998 -0.1788949 -0.1353119 -0.1533717 -0.1829249 -0.1621036 -0.1408372 -0.0923845 -0.1744075 -0.1524571 -0.1798454 -0.1926175 -0.2093859 -0.1348924 -0.1608376 -0.1848471 -0.1649195 -0.1425272 -0.1150026 0.0843876 -0.0316503 0.0435410 0.1403298 0.0491515 0.0510729 0.0366382 0.0098417 0.0946020 -0.0189720 -0.0503969 0.0048063 -0.0227213 0.0212143 0.0415201 -0.0867129 0.0212012 0.0866767 0.0781368 0.0742579 -0.0287661 -0.0683850 -0.0967183 -0.1319780 -0.0478311 0.0027232 -0.0749480 0.0017920 -0.0595834 0.0136214 0.0894074 0.0639191 0.0654692 0.0603447 0.0328153 -0.0242059 0.0626347 0.1808580
NumberOfReadmissions_HIP-KNEE NumberOfReadmissions_HIP-KNEE 0.0280292 -0.0298799 -0.0742574 1.0000000 0.0798078 0.0740972 0.0227895 0.0398242 0.0599664 0.1063912 0.0109790 0.0177193 0.1181224 0.1391159 0.0779986 0.0692166 0.0716989 0.0290342 0.0495607 0.0542992 0.1106351 0.0187289 0.0248142 0.1116353 0.1516789 0.0425263 0.0450149 0.0255192 0.0895220 0.0668517 0.0478896 -0.0553207 -0.0126112 0.0749623 0.0004693 -0.0353431 0.0408255 0.0074947 0.0166986 0.0343487 -0.0087498 0.0347316 0.0586463 0.0734882 -0.1517362 -0.1650703 -0.0843669 -0.1375901 -0.1231721 -0.1345921 -0.0339593 -0.0798121 -0.0499866 -0.1384081 -0.0209899 -0.0728092 -0.0689026 -0.0448292 -0.0832889 -0.0662610 -0.0665941 -0.0942117 -0.1234255
PatientSurveyStarRating_H_COMP_1_STAR_RATING PatientSurveyStarRating_H_COMP_1_STAR_RATING -0.1590972 -0.2144148 -0.1696704 0.0798078 1.0000000 0.7652622 0.8092146 0.7817407 0.6947694 0.8094552 0.5857397 0.6176477 0.8113195 0.7410800 0.8821074 0.9413741 0.7947199 0.8439662 0.8192660 0.7212981 0.8279021 0.6062342 0.6396807 0.8361084 0.7761632 -0.3811431 -0.0217341 0.1991033 -0.3200784 -0.1816118 -0.2152685 0.0440556 0.0700383 0.1401327 0.1288469 0.0438649 -0.0683424 0.1642083 0.1503950 0.2176041 0.1753654 0.1870897 -0.0859359 -0.0029321 -0.0761301 -0.0479940 -0.0124762 0.1078488 0.0183801 -0.0132887 -0.0219430 -0.0320167 -0.0184090 0.0046992 0.0835767 -0.0402520 -0.1459187 -0.0669148 -0.1424419 0.0017770 0.0329127 -0.1122411 -0.2121841
PatientSurveyStarRating_H_COMP_2_STAR_RATING PatientSurveyStarRating_H_COMP_2_STAR_RATING -0.1749847 -0.2287902 -0.1755397 0.0740972 0.7652622 1.0000000 0.6820394 0.7310740 0.6334942 0.7745158 0.4946512 0.5958859 0.7576397 0.7012869 0.8135436 0.7932584 0.9499147 0.7139678 0.7675074 0.6648163 0.8027744 0.5120334 0.6187305 0.7826816 0.7365285 -0.2976284 0.0206259 0.2149395 -0.2473453 -0.1452767 -0.1462753 0.0003840 0.0721589 0.0759631 0.0920878 0.0418658 -0.0697479 0.1295914 0.0899079 0.1940882 0.1763581 0.1579604 -0.0887946 -0.0231504 -0.0615070 -0.0650654 -0.0163367 0.0722015 -0.0021513 -0.0165143 -0.0016914 -0.0041080 0.0301520 -0.0171646 0.0956403 -0.0267891 -0.1367919 -0.0390690 -0.1430591 -0.0079719 0.0359419 -0.0871039 -0.1893123
PatientSurveyStarRating_H_COMP_3_STAR_RATING PatientSurveyStarRating_H_COMP_3_STAR_RATING -0.1597408 -0.2138329 -0.1644998 0.0227895 0.8092146 0.6820394 1.0000000 0.7583026 0.6569534 0.7355634 0.5956464 0.6138696 0.7478412 0.6650712 0.8260329 0.8314522 0.7052054 0.9423558 0.7878299 0.6812130 0.7546488 0.6155860 0.6319284 0.7774308 0.6961148 -0.3896187 -0.0747053 0.1498815 -0.3981251 -0.2241446 -0.2360570 0.0486530 0.0507759 0.0900327 0.1321630 0.0521192 -0.0725481 0.1605381 0.1368127 0.1201354 0.1413712 0.0905514 -0.0996751 -0.0477865 -0.0405313 -0.0177776 0.0302290 0.1568119 0.0384967 0.0435642 -0.0192053 -0.0016542 0.0274588 0.0159553 0.0859168 -0.0262995 -0.1292050 -0.0865925 -0.1330731 0.0092188 0.0480143 -0.1032756 -0.1564699
PatientSurveyStarRating_H_COMP_5_STAR_RATING PatientSurveyStarRating_H_COMP_5_STAR_RATING -0.1551935 -0.2191170 -0.1788949 0.0398242 0.7817407 0.7310740 0.7583026 1.0000000 0.6659220 0.7694318 0.5769327 0.5922228 0.7502954 0.6685334 0.8320006 0.8038535 0.7587947 0.7931167 0.9410632 0.6945471 0.7903879 0.5987567 0.6129310 0.7793734 0.7066986 -0.3575914 -0.0062821 0.1693415 -0.3234673 -0.1918556 -0.1939137 0.0438526 0.0664748 0.0784938 0.1324347 0.0587805 -0.0562912 0.1606626 0.1423359 0.1358181 0.1964279 0.1045092 -0.0999738 -0.0151359 -0.0450744 -0.0510480 -0.0130589 0.0644025 -0.0046993 0.0257903 -0.0221085 -0.0185639 0.0273930 -0.0011887 0.0897240 -0.0322016 -0.1375039 -0.0537461 -0.1257130 -0.0064032 0.0443475 -0.1023631 -0.1576660
PatientSurveyStarRating_H_COMP_6_STAR_RATING PatientSurveyStarRating_H_COMP_6_STAR_RATING -0.1771868 -0.2029456 -0.1353119 0.0599664 0.6947694 0.6334942 0.6569534 0.6659220 1.0000000 0.7186423 0.4784757 0.4292218 0.6769019 0.6409371 0.7586740 0.7386381 0.6584544 0.6847212 0.7015421 0.9400388 0.7549683 0.4999730 0.4567781 0.7098506 0.6770439 -0.3008610 0.0354071 0.2404893 -0.2071783 -0.1351116 -0.1225568 0.0625152 0.1045421 0.1184473 0.1443955 0.0557260 -0.0637431 0.1782560 0.1758781 0.2091327 0.2034132 0.1930126 0.0235381 0.0288319 -0.0990420 -0.0682603 0.0034724 0.1194712 -0.0343594 0.0159051 0.0012755 0.0047573 0.0312923 -0.0120380 0.0883277 -0.0190535 -0.1484276 -0.0612852 -0.1424055 -0.0023314 0.0662158 -0.0911960 -0.2089917
PatientSurveyStarRating_H_COMP_7_STAR_RATING PatientSurveyStarRating_H_COMP_7_STAR_RATING -0.1941803 -0.2247172 -0.1533717 0.1063912 0.8094552 0.7745158 0.7355634 0.7694318 0.7186423 1.0000000 0.5720636 0.6101555 0.8272215 0.7939928 0.8743494 0.8277780 0.7995932 0.7605870 0.8011319 0.7433629 0.9482189 0.5929605 0.6371231 0.8571093 0.8310741 -0.3553994 0.0397427 0.2374707 -0.2572208 -0.1626711 -0.2149719 0.0359131 0.0858822 0.1201701 0.1393844 0.0474753 -0.0250525 0.1732976 0.1422983 0.2157600 0.1380616 0.1802407 -0.0364693 0.0393084 -0.1067242 -0.1098730 -0.0673445 0.0151872 -0.0880905 -0.0653713 -0.0300329 -0.0817158 -0.0009316 -0.0348072 0.0827573 -0.0399586 -0.1668264 -0.0670446 -0.1593474 -0.0164980 0.0357574 -0.1311288 -0.1977109
PatientSurveyStarRating_H_CLEAN_STAR_RATING PatientSurveyStarRating_H_CLEAN_STAR_RATING -0.1160735 -0.1981594 -0.1829249 0.0109790 0.5857397 0.4946512 0.5956464 0.5769327 0.4784757 0.5720636 1.0000000 0.4987457 0.5965227 0.5237951 0.6391671 0.5928460 0.5111389 0.6220272 0.5982789 0.4927072 0.5781200 0.9570846 0.5105668 0.6248221 0.5505967 -0.3195008 -0.0225176 0.0814169 -0.3267508 -0.1804160 -0.2479612 0.0190913 0.0137026 0.0598688 0.1690161 0.0858056 -0.0192334 0.1869822 0.1376097 0.0367533 0.0909053 0.0153703 -0.0848392 -0.0027352 -0.0571772 -0.0630079 -0.0352455 0.0665845 0.0009844 -0.0651509 -0.0629368 -0.1272277 -0.0390292 0.0016051 -0.0069280 -0.0851994 -0.1286917 -0.0807406 -0.1231640 -0.0459005 -0.0007875 -0.1463724 -0.0409475
PatientSurveyStarRating_H_QUIET_STAR_RATING PatientSurveyStarRating_H_QUIET_STAR_RATING -0.1047116 -0.1748506 -0.1621036 0.0177193 0.6176477 0.5958859 0.6138696 0.5922228 0.4292218 0.6101555 0.4987457 1.0000000 0.6313199 0.5470896 0.6730863 0.6317832 0.6199984 0.6414418 0.6171349 0.4395790 0.6249789 0.5123475 0.9556614 0.6537481 0.5755335 -0.3375393 -0.1460561 0.0866767 -0.3615058 -0.1742767 -0.2095914 0.0018524 -0.0044671 0.0303764 0.0919836 0.0246699 -0.0291912 0.1003528 0.1010108 0.0765341 0.1069590 0.0440800 -0.0897193 -0.0754020 -0.0250746 0.0428372 0.0722857 0.1510853 0.0902457 0.0337813 -0.0457618 -0.0343542 -0.0062045 0.0093687 0.0220979 -0.0051357 -0.0772394 -0.0730231 -0.1033157 -0.0277664 -0.0213211 -0.0977954 -0.0612456
PatientSurveyStarRating_H_HSP_RATING_STAR_RATING PatientSurveyStarRating_H_HSP_RATING_STAR_RATING -0.1709175 -0.2016141 -0.1408372 0.1181224 0.8113195 0.7576397 0.7478412 0.7502954 0.6769019 0.8272215 0.5965227 0.6313199 1.0000000 0.8595636 0.8714961 0.8454781 0.7811879 0.7821354 0.7928741 0.7081668 0.8548308 0.6252851 0.6598602 0.9428150 0.9030319 -0.3410053 0.0325646 0.2088800 -0.2358015 -0.1677838 -0.2095003 0.0163640 0.0773879 0.1105834 0.1692958 0.0877507 0.0093791 0.1897215 0.1544748 0.2160677 0.1589966 0.2063053 -0.0348377 0.0691782 -0.0967251 -0.0951559 -0.0338034 0.0240132 -0.0613852 -0.0814552 -0.0490180 -0.0852573 -0.0166476 -0.0686735 0.0710883 -0.0270735 -0.1585705 -0.0643414 -0.1472927 -0.0205437 0.0287970 -0.1387904 -0.2018848
PatientSurveyStarRating_H_RECMND_STAR_RATING PatientSurveyStarRating_H_RECMND_STAR_RATING -0.1659007 -0.1642097 -0.0923845 0.1391159 0.7410800 0.7012869 0.6650712 0.6685334 0.6409371 0.7939928 0.5237951 0.5470896 0.8595636 1.0000000 0.7960527 0.7806696 0.7300827 0.6882451 0.7037121 0.6761773 0.8247493 0.5530915 0.5738169 0.9052988 0.9480759 -0.2979673 0.0850940 0.2212547 -0.1390380 -0.1405352 -0.1692993 -0.0078546 0.0885267 0.1190673 0.1532363 0.0735792 0.0265116 0.1760486 0.1358664 0.2348521 0.1378362 0.2020601 0.0037855 0.0981665 -0.1246042 -0.0981817 -0.0266626 -0.0005694 -0.1107731 -0.1033645 -0.0387585 -0.0857042 -0.0221487 -0.0800058 0.0787681 0.0075443 -0.1576927 -0.0393015 -0.1297623 0.0134380 0.0262619 -0.1202753 -0.2235957
PatientSurveyStarRating_H_STAR_RATING PatientSurveyStarRating_H_STAR_RATING -0.1783759 -0.2308937 -0.1744075 0.0779986 0.8821074 0.8135436 0.8260329 0.8320006 0.7586740 0.8743494 0.6391671 0.6730863 0.8714961 0.7960527 1.0000000 0.8920184 0.8314222 0.8501602 0.8592370 0.7769593 0.8837377 0.6599688 0.6959526 0.8852220 0.8308877 -0.3910783 0.0049679 0.2156832 -0.3210235 -0.2011786 -0.2229136 0.0373120 0.0727680 0.1107188 0.1500361 0.0637937 -0.0494590 0.1800467 0.1591593 0.1860190 0.1945060 0.1646616 -0.0552836 0.0051809 -0.0822439 -0.0710775 -0.0009117 0.0947303 -0.0145406 -0.0275058 -0.0283146 -0.0547617 0.0061083 -0.0156145 0.0913137 -0.0452172 -0.1540619 -0.0738343 -0.1614933 -0.0003415 0.0303266 -0.1263561 -0.1981220
HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE -0.1666746 -0.2067264 -0.1524571 0.0692166 0.9413741 0.7932584 0.8314522 0.8038535 0.7386381 0.8277780 0.5928460 0.6317832 0.8454781 0.7806696 0.8920184 1.0000000 0.8343982 0.8821498 0.8499295 0.7853207 0.8750498 0.6213823 0.6671546 0.8901557 0.8322887 -0.3639238 -0.0217481 0.2169674 -0.3107282 -0.1745392 -0.2104045 0.0422069 0.0800014 0.1588429 0.1603505 0.0495956 -0.0522724 0.1983566 0.1847906 0.2367758 0.2012691 0.2088467 -0.0544304 0.0173084 -0.0796721 -0.0452192 -0.0003548 0.1228267 0.0156651 -0.0170135 -0.0236955 -0.0303324 -0.0019546 -0.0080622 0.0845264 -0.0270956 -0.1428877 -0.0663743 -0.1459851 -0.0029199 0.0385911 -0.1096342 -0.2107792
HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE -0.1802262 -0.2352248 -0.1798454 0.0716989 0.7947199 0.9499147 0.7052054 0.7587947 0.6584544 0.7995932 0.5111389 0.6199984 0.7811879 0.7300827 0.8314222 0.8343982 1.0000000 0.7459134 0.8003061 0.6977774 0.8394911 0.5323986 0.6495609 0.8179081 0.7709408 -0.3129324 0.0155923 0.2128270 -0.2565866 -0.1409829 -0.1463552 0.0002792 0.0782762 0.0802578 0.1011092 0.0415646 -0.0658150 0.1433814 0.0975333 0.1984598 0.1920243 0.1718119 -0.0972506 -0.0314520 -0.0680571 -0.0565823 -0.0027657 0.0845920 0.0121242 0.0058744 0.0019106 -0.0063422 0.0334780 -0.0028885 0.0956884 -0.0236698 -0.1414366 -0.0380979 -0.1394736 -0.0143285 0.0425664 -0.0826332 -0.1875503
HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE -0.1831908 -0.2449572 -0.1926175 0.0290342 0.8439662 0.7139678 0.9423558 0.7931167 0.6847212 0.7605870 0.6220272 0.6414418 0.7821354 0.6882451 0.8501602 0.8821498 0.7459134 1.0000000 0.8363904 0.7154682 0.7981273 0.6483447 0.6695835 0.8221836 0.7369761 -0.3677548 -0.0905698 0.1363775 -0.4107053 -0.2259603 -0.2585483 0.0412981 0.0395669 0.0998162 0.1511649 0.0576566 -0.0681526 0.1772872 0.1643799 0.1340445 0.1572709 0.1065507 -0.1148325 -0.0405641 -0.0605862 -0.0215130 0.0433887 0.1663036 0.0511012 0.0476323 -0.0223315 -0.0117228 0.0152153 0.0187552 0.0790559 -0.0330533 -0.1408040 -0.0953948 -0.1388091 -0.0063199 0.0405341 -0.1131291 -0.1799323
HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE -0.1775098 -0.2546502 -0.2093859 0.0495607 0.8192660 0.7675074 0.7878299 0.9410632 0.7015421 0.8011319 0.5982789 0.6171349 0.7928741 0.7037121 0.8592370 0.8499295 0.8003061 0.8363904 1.0000000 0.7405838 0.8369260 0.6236211 0.6436063 0.8239488 0.7494949 -0.3259347 -0.0050286 0.1728899 -0.3312654 -0.1928398 -0.1986827 0.0259892 0.0644512 0.0816216 0.1501584 0.0726417 -0.0510706 0.1823598 0.1625916 0.1647114 0.2084062 0.1450104 -0.1112139 -0.0255467 -0.0557335 -0.0641464 -0.0091794 0.0704411 -0.0036037 0.0323273 -0.0128473 -0.0052418 0.0247052 -0.0039459 0.0793646 -0.0452983 -0.1557561 -0.0685223 -0.1386420 -0.0063967 0.0420468 -0.1086247 -0.1660056
HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE -0.1844376 -0.2079982 -0.1348924 0.0542992 0.7212981 0.6648163 0.6812130 0.6945471 0.9400388 0.7433629 0.4927072 0.4395790 0.7081668 0.6761773 0.7769593 0.7853207 0.6977774 0.7154682 0.7405838 1.0000000 0.7968806 0.5182906 0.4744589 0.7556492 0.7238004 -0.2900150 0.0369924 0.2564050 -0.2066316 -0.1376094 -0.1292560 0.0701172 0.1220570 0.1274513 0.1720468 0.0650279 -0.0549999 0.2122738 0.2084118 0.2236306 0.2431733 0.2738285 0.0383041 0.0559072 -0.1078990 -0.0733482 0.0081764 0.1281267 -0.0315360 0.0086356 0.0049870 0.0228481 0.0256121 -0.0176853 0.0859133 -0.0145875 -0.1539039 -0.0536226 -0.1430776 0.0069610 0.0649921 -0.0872152 -0.2310210
HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE -0.1877798 -0.2250397 -0.1608376 0.1106351 0.8279021 0.8027744 0.7546488 0.7903879 0.7549683 0.9482189 0.5781200 0.6249789 0.8548308 0.8247493 0.8837377 0.8750498 0.8394911 0.7981273 0.8369260 0.7968806 1.0000000 0.6025716 0.6612420 0.9026621 0.8770384 -0.3383196 0.0347599 0.2352166 -0.2492489 -0.1534787 -0.2043225 0.0287005 0.0923400 0.1255794 0.1636239 0.0630912 -0.0194129 0.2014907 0.1664733 0.2341396 0.1543470 0.2096414 -0.0268317 0.0534239 -0.1051209 -0.1115171 -0.0514076 0.0259241 -0.0829950 -0.0564195 -0.0211932 -0.0638071 0.0098029 -0.0381375 0.0825062 -0.0316508 -0.1629600 -0.0614646 -0.1636005 -0.0189605 0.0320446 -0.1202132 -0.2069970
HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE -0.1223047 -0.2045779 -0.1848471 0.0187289 0.6062342 0.5120334 0.6155860 0.5987567 0.4999730 0.5929605 0.9570846 0.5123475 0.6252851 0.5530915 0.6599688 0.6213823 0.5323986 0.6483447 0.6236211 0.5182906 0.6025716 1.0000000 0.5250997 0.6604779 0.5844106 -0.3337655 -0.0182565 0.0860619 -0.3285392 -0.1777808 -0.2595566 0.0253586 0.0135852 0.0584827 0.1855312 0.1034169 -0.0099818 0.2012726 0.1514801 0.0455756 0.1000540 0.0284015 -0.0880614 -0.0051760 -0.0539829 -0.0624749 -0.0255442 0.0677998 -0.0009274 -0.0795298 -0.0709592 -0.1182074 -0.0369162 0.0027541 -0.0109302 -0.0753999 -0.1233582 -0.0813748 -0.1261494 -0.0505286 0.0061946 -0.1486299 -0.0439537
HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE -0.1123794 -0.1801665 -0.1649195 0.0248142 0.6396807 0.6187305 0.6319284 0.6129310 0.4567781 0.6371231 0.5105668 0.9556614 0.6598602 0.5738169 0.6959526 0.6671546 0.6495609 0.6695835 0.6436063 0.4744589 0.6612420 0.5250997 1.0000000 0.6896579 0.6107149 -0.3437480 -0.1442006 0.1064024 -0.3661649 -0.1699262 -0.2145544 -0.0022965 0.0082056 0.0494650 0.0908135 0.0158871 -0.0303120 0.1023846 0.1022135 0.0817038 0.1098322 0.0437557 -0.0886019 -0.0634120 -0.0341816 0.0366437 0.0773892 0.1553155 0.0950135 0.0255069 -0.0454692 -0.0450913 -0.0062861 0.0051695 0.0262747 -0.0054371 -0.0789391 -0.0813685 -0.1096072 -0.0256155 -0.0196124 -0.1003157 -0.0653730
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE -0.1779322 -0.2060912 -0.1425272 0.1116353 0.8361084 0.7826816 0.7774308 0.7793734 0.7098506 0.8571093 0.6248221 0.6537481 0.9428150 0.9052988 0.8852220 0.8901557 0.8179081 0.8221836 0.8239488 0.7556492 0.9026621 0.6604779 0.6896579 1.0000000 0.9580767 -0.3521196 0.0302154 0.2182505 -0.2448842 -0.1752428 -0.2290270 0.0077229 0.0865028 0.1100002 0.1826272 0.0844502 0.0072643 0.2115650 0.1691943 0.2211843 0.1738457 0.2315523 -0.0248375 0.0775236 -0.1091775 -0.0952111 -0.0230632 0.0300940 -0.0702915 -0.0760295 -0.0499063 -0.0948401 -0.0054742 -0.0590998 0.0707774 -0.0157615 -0.1615833 -0.0674588 -0.1458284 -0.0172231 0.0304634 -0.1390570 -0.2108956
HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE -0.1852873 -0.1911254 -0.1150026 0.1516789 0.7761632 0.7365285 0.6961148 0.7066986 0.6770439 0.8310741 0.5505967 0.5755335 0.9030319 0.9480759 0.8308877 0.8322887 0.7709408 0.7369761 0.7494949 0.7238004 0.8770384 0.5844106 0.6107149 0.9580767 1.0000000 -0.2951331 0.0831560 0.2197432 -0.1585545 -0.1471003 -0.1980854 -0.0179814 0.0967347 0.1229267 0.1676529 0.0687394 0.0292287 0.1933820 0.1501468 0.2439658 0.1443141 0.2442083 0.0058311 0.1184210 -0.1332048 -0.1095716 -0.0262870 -0.0003806 -0.1100101 -0.1074875 -0.0363290 -0.1008569 -0.0193730 -0.0826240 0.0862428 -0.0105068 -0.1710237 -0.0510987 -0.1364494 0.0003983 0.0331751 -0.1279752 -0.2364653
ED_2_Strata_1 ED_2_Strata_1 0.0762327 0.1082272 0.0843876 0.0425263 -0.3811431 -0.2976284 -0.3896187 -0.3575914 -0.3008610 -0.3553994 -0.3195008 -0.3375393 -0.3410053 -0.2979673 -0.3910783 -0.3639238 -0.3129324 -0.3677548 -0.3259347 -0.2900150 -0.3383196 -0.3337655 -0.3437480 -0.3521196 -0.2951331 1.0000000 0.1248128 0.0186099 0.5775206 0.4204676 0.3419958 -0.0684821 0.0164159 -0.0486040 -0.1152627 -0.0615764 -0.0354024 -0.1265965 -0.0558505 -0.0983294 0.0254940 -0.0925059 0.0761139 0.0321584 0.0673396 0.0579101 -0.0537002 -0.1136600 -0.0389080 0.0005912 0.0602674 -0.0541503 0.0551291 -0.0211509 -0.0336614 0.0315830 0.1430203 0.1038617 0.0817412 0.0633107 -0.0393782 0.1286484 0.1212071
HCP_COVID_19 HCP_COVID_19 -0.0508152 -0.0563082 -0.0316503 0.0450149 -0.0217341 0.0206259 -0.0747053 -0.0062821 0.0354071 0.0397427 -0.0225176 -0.1460561 0.0325646 0.0850940 0.0049679 -0.0217481 0.0155923 -0.0905698 -0.0050286 0.0369924 0.0347599 -0.0182565 -0.1442006 0.0302154 0.0831560 0.1248128 1.0000000 0.3203622 0.2574291 0.0698819 0.1122982 -0.0306155 0.1067941 -0.0812735 -0.0345104 0.0310392 -0.0124470 -0.0175650 -0.1149175 0.0947908 0.0304334 0.0831624 0.0241622 -0.0151698 -0.0510683 -0.0869890 -0.1128278 -0.1245435 -0.1523779 -0.0988833 0.0943953 0.0417007 0.0225916 -0.0272232 0.0549990 0.0009222 -0.0909811 0.1091949 -0.0160408 0.0169512 0.0430812 0.0534106 -0.0627505
IMM_3 IMM_3 -0.0583326 -0.0028840 0.0435410 0.0255192 0.1991033 0.2149395 0.1498815 0.1693415 0.2404893 0.2374707 0.0814169 0.0866767 0.2088800 0.2212547 0.2156832 0.2169674 0.2128270 0.1363775 0.1728899 0.2564050 0.2352166 0.0860619 0.1064024 0.2182505 0.2197432 0.0186099 0.3203622 1.0000000 0.1105628 0.0343661 0.0560372 0.0400235 0.1317922 0.0410289 0.0439297 0.0519538 -0.0201631 0.0361783 0.0484998 0.1318005 0.1020708 0.0900367 0.0906329 0.0058144 -0.0212916 -0.0165321 -0.0616051 -0.0010634 -0.0761104 0.0146397 0.0451508 0.0601831 0.0544576 -0.0311579 0.0899361 0.0412713 -0.0625676 0.0594933 -0.0250714 0.0723872 0.0625753 0.0226639 -0.0720431
OP_18b OP_18b 0.0505288 0.1295727 0.1403298 0.0895220 -0.3200784 -0.2473453 -0.3981251 -0.3234673 -0.2071783 -0.2572208 -0.3267508 -0.3615058 -0.2358015 -0.1390380 -0.3210235 -0.3107282 -0.2565866 -0.4107053 -0.3312654 -0.2066316 -0.2492489 -0.3285392 -0.3661649 -0.2448842 -0.1585545 0.5775206 0.2574291 0.1105628 1.0000000 0.4959758 0.5894838 -0.0756249 0.0506067 -0.1400845 -0.1714513 -0.0557159 0.0160417 -0.2007893 -0.1593730 0.0723310 -0.1190028 0.0948584 0.2344307 0.0629824 -0.0293698 -0.0678837 -0.1527290 -0.2195933 -0.1858291 -0.0905644 0.0583644 0.0638412 0.0187794 -0.0806516 0.0224833 0.0544528 -0.0076797 0.1653812 0.0993570 0.0676972 0.0374530 0.0986077 -0.0241965
OP_18c OP_18c 0.0838395 0.0877808 0.0491515 0.0668517 -0.1816118 -0.1452767 -0.2241446 -0.1918556 -0.1351116 -0.1626711 -0.1804160 -0.1742767 -0.1677838 -0.1405352 -0.2011786 -0.1745392 -0.1409829 -0.2259603 -0.1928398 -0.1376094 -0.1534787 -0.1777808 -0.1699262 -0.1752428 -0.1471003 0.4204676 0.0698819 0.0343661 0.4959758 1.0000000 0.3393524 0.0090382 0.0430774 -0.0573485 -0.0726374 -0.0430698 0.0018369 -0.0844129 -0.0481731 -0.0076033 -0.0476864 0.0158359 0.1235925 0.0408574 0.0028501 -0.0296718 -0.0977614 -0.1375348 -0.0578145 -0.0463395 0.0181970 0.0046272 0.0392571 -0.0457654 -0.0128057 0.0031479 0.0074122 0.0486693 0.0559662 0.0192234 0.0049833 0.0401196 0.0264108
OP_22 OP_22 0.0252830 0.0501727 0.0510729 0.0478896 -0.2152685 -0.1462753 -0.2360570 -0.1939137 -0.1225568 -0.2149719 -0.2479612 -0.2095914 -0.2095003 -0.1692993 -0.2229136 -0.2104045 -0.1463552 -0.2585483 -0.1986827 -0.1292560 -0.2043225 -0.2595566 -0.2145544 -0.2290270 -0.1980854 0.3419958 0.1122982 0.0560372 0.5894838 0.3393524 1.0000000 -0.0949210 0.0293870 -0.1014242 -0.2178259 -0.1066885 -0.0990718 -0.2279495 -0.1444805 -0.0066518 -0.0291353 0.0057888 0.0805708 -0.0317460 0.0214909 0.0376811 -0.0446228 -0.0767863 -0.0559254 0.0198257 0.0648778 0.1019307 0.0452860 0.0102293 0.0426036 0.0343100 0.0397075 0.0778917 0.0451515 0.0652307 0.0094012 0.0926476 -0.0237710
OP_23 OP_23 0.0485500 0.0614226 0.0366382 -0.0553207 0.0440556 0.0003840 0.0486530 0.0438526 0.0625152 0.0359131 0.0190913 0.0018524 0.0163640 -0.0078546 0.0373120 0.0422069 0.0002792 0.0412981 0.0259892 0.0701172 0.0287005 0.0253586 -0.0022965 0.0077229 -0.0179814 -0.0684821 -0.0306155 0.0400235 -0.0756249 0.0090382 -0.0949210 1.0000000 0.0732988 0.0371036 0.1919253 0.1198314 0.0629739 0.1834131 0.1440580 0.0821585 0.1352184 0.0401311 0.2485615 0.1664553 0.0398944 0.0034495 0.0135325 0.0045056 -0.0210473 -0.0614632 -0.0570477 -0.0326986 -0.0515403 0.0269822 -0.0140788 -0.0045339 0.0279871 -0.0606628 -0.0384416 -0.0593032 -0.0270588 -0.0530033 0.0545394
OP_29 OP_29 -0.0296026 -0.0106510 0.0098417 -0.0126112 0.0700383 0.0721589 0.0507759 0.0664748 0.1045421 0.0858822 0.0137026 -0.0044671 0.0773879 0.0885267 0.0727680 0.0800014 0.0782762 0.0395669 0.0644512 0.1220570 0.0923400 0.0135852 0.0082056 0.0865028 0.0967347 0.0164159 0.1067941 0.1317922 0.0506067 0.0430774 0.0293870 0.0732988 1.0000000 -0.0650231 0.0952846 0.0685334 0.0455411 0.1111026 0.0304354 0.0653732 0.0263917 0.0285800 0.1567526 0.0271825 -0.0096464 -0.0600569 0.0081252 0.0099705 -0.0536184 -0.0251654 -0.0032584 0.0312780 0.0059688 -0.0199006 0.0150699 0.0331569 -0.0837910 -0.0108546 -0.0087678 0.0163779 0.0419068 -0.0312757 -0.0815209
SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS 0.0627005 0.1063002 0.0946020 0.0749623 0.1401327 0.0759631 0.0900327 0.0784938 0.1184473 0.1201701 0.0598688 0.0303764 0.1105834 0.1190673 0.1107188 0.1588429 0.0802578 0.0998162 0.0816216 0.1274513 0.1255794 0.0584827 0.0494650 0.1100002 0.1229267 -0.0486040 -0.0812735 0.0410289 -0.1400845 -0.0573485 -0.1014242 0.0371036 -0.0650231 1.0000000 0.0650110 0.0100193 0.0697614 0.0854069 0.0857646 0.1450732 0.0858291 0.1347354 -0.0563373 0.1913961 -0.0081923 -0.0643353 -0.0362573 0.0171605 -0.0204107 -0.0804707 -0.0287158 -0.0995591 -0.0603629 -0.0017558 -0.0043396 -0.0382369 0.0137283 -0.0380145 -0.0258097 -0.0166146 -0.0268805 -0.0300979 -0.0048449
SEP_1 SEP_1 -0.0309171 -0.0326387 -0.0189720 0.0004693 0.1288469 0.0920878 0.1321630 0.1324347 0.1443955 0.1393844 0.1690161 0.0919836 0.1692958 0.1532363 0.1500361 0.1603505 0.1011092 0.1511649 0.1501584 0.1720468 0.1636239 0.1855312 0.0908135 0.1826272 0.1676529 -0.1152627 -0.0345104 0.0439297 -0.1714513 -0.0726374 -0.2178259 0.1919253 0.0952846 0.0650110 1.0000000 0.7309445 0.5744973 0.8329106 0.6460143 0.0976908 0.0923975 0.1245526 0.2303117 0.2029424 -0.0347567 -0.0154393 0.0197865 0.0170609 -0.0335808 -0.0769699 -0.0797702 -0.0937443 -0.0339321 -0.0179990 -0.0245784 -0.0390884 -0.0564170 -0.0757690 -0.0254596 -0.0644595 -0.0051105 -0.1034135 -0.0078309
SEP_SH_3HR SEP_SH_3HR -0.0529663 -0.0689850 -0.0503969 -0.0353431 0.0438649 0.0418658 0.0521192 0.0587805 0.0557260 0.0474753 0.0858056 0.0246699 0.0877507 0.0735792 0.0637937 0.0495956 0.0415646 0.0576566 0.0726417 0.0650279 0.0630912 0.1034169 0.0158871 0.0844502 0.0687394 -0.0615764 0.0310392 0.0519538 -0.0557159 -0.0430698 -0.1066885 0.1198314 0.0685334 0.0100193 0.7309445 1.0000000 0.3894182 0.5124276 0.3029233 0.0214458 -0.0003039 0.0083388 0.0443562 0.0597759 -0.0068112 0.0143010 0.0121194 0.0396301 0.0312695 -0.0311801 -0.0131473 0.0329404 0.0208887 -0.0022962 0.0293689 -0.0176156 -0.0367514 -0.0072258 0.0030403 -0.0148982 0.0376698 -0.0243705 0.0044851
SEP_SH_6HR SEP_SH_6HR 0.0313734 0.0231777 0.0048063 0.0408255 -0.0683424 -0.0697479 -0.0725481 -0.0562912 -0.0637431 -0.0250525 -0.0192334 -0.0291912 0.0093791 0.0265116 -0.0494590 -0.0522724 -0.0658150 -0.0681526 -0.0510706 -0.0549999 -0.0194129 -0.0099818 -0.0303120 0.0072643 0.0292287 -0.0354024 -0.0124470 -0.0201631 0.0160417 0.0018369 -0.0990718 0.0629739 0.0455411 0.0697614 0.5744973 0.3894182 1.0000000 0.3849719 0.2338399 -0.0161607 -0.0334702 -0.0156808 0.1359384 0.1422354 -0.0041075 -0.0631275 -0.0279555 -0.0515598 -0.0787945 -0.0854897 -0.0659226 -0.0727333 -0.0738364 -0.0368684 -0.0345921 0.0185638 -0.0310002 -0.0467304 -0.0030434 -0.0228048 0.0032891 -0.0713756 0.0072647
SEV_SEP_3HR SEV_SEP_3HR -0.0322822 -0.0375511 -0.0227213 0.0074947 0.1642083 0.1295914 0.1605381 0.1606626 0.1782560 0.1732976 0.1869822 0.1003528 0.1897215 0.1760486 0.1800467 0.1983566 0.1433814 0.1772872 0.1823598 0.2122738 0.2014907 0.2012726 0.1023846 0.2115650 0.1933820 -0.1265965 -0.0175650 0.0361783 -0.2007893 -0.0844129 -0.2279495 0.1834131 0.1111026 0.0854069 0.8329106 0.5124276 0.3849719 1.0000000 0.4694083 0.1935342 0.1476882 0.2318112 0.2111767 0.2098349 -0.0375621 -0.0091369 0.0251530 0.0425397 -0.0246311 -0.0663090 -0.0624234 -0.0956269 -0.0378140 -0.0141213 -0.0257851 -0.0434036 -0.0699820 -0.0642500 -0.0485933 -0.0700502 -0.0047427 -0.0993958 -0.0154257
SEV_SEP_6HR SEV_SEP_6HR -0.0278551 -0.0014846 0.0212143 0.0166986 0.1503950 0.0899079 0.1368127 0.1423359 0.1758781 0.1422983 0.1376097 0.1010108 0.1544748 0.1358664 0.1591593 0.1847906 0.0975333 0.1643799 0.1625916 0.2084118 0.1664733 0.1514801 0.1022135 0.1691943 0.1501468 -0.0558505 -0.1149175 0.0484998 -0.1593730 -0.0481731 -0.1444805 0.1440580 0.0304354 0.0857646 0.6460143 0.3029233 0.2338399 0.4694083 1.0000000 0.0037384 0.0649257 0.0631441 0.2522518 0.2083065 -0.0109117 -0.0072140 0.0299786 0.0287674 -0.0152374 -0.0457369 -0.0572546 -0.0824732 -0.0074196 -0.0351047 -0.0587674 -0.0265019 -0.0400763 -0.0886155 -0.0494623 -0.0377753 -0.0179532 -0.0875069 -0.0092648
STK_02 STK_02 -0.0434539 0.0066866 0.0415201 0.0343487 0.2176041 0.1940882 0.1201354 0.1358181 0.2091327 0.2157600 0.0367533 0.0765341 0.2160677 0.2348521 0.1860190 0.2367758 0.1984598 0.1340445 0.1647114 0.2236306 0.2341396 0.0455756 0.0817038 0.2211843 0.2439658 -0.0983294 0.0947908 0.1318005 0.0723310 -0.0076033 -0.0066518 0.0821585 0.0653732 0.1450732 0.0976908 0.0214458 -0.0161607 0.1935342 0.0037384 1.0000000 0.3266488 0.8021725 0.4631419 0.4148180 -0.0319671 -0.0518587 -0.0534076 0.0093937 -0.0939059 -0.0905922 -0.0044494 -0.0746093 0.0129386 -0.0292526 0.0219699 -0.0138235 -0.0675289 0.0183344 -0.0210595 0.0392025 -0.0065909 -0.0294786 -0.1317922
STK_05 STK_05 -0.0205857 -0.0801055 -0.0867129 -0.0087498 0.1753654 0.1763581 0.1413712 0.1964279 0.2034132 0.1380616 0.0909053 0.1069590 0.1589966 0.1378362 0.1945060 0.2012691 0.1920243 0.1572709 0.2084062 0.2431733 0.1543470 0.1000540 0.1098322 0.1738457 0.1443141 0.0254940 0.0304334 0.1020708 -0.1190028 -0.0476864 -0.0291353 0.1352184 0.0263917 0.0858291 0.0923975 -0.0003039 -0.0334702 0.1476882 0.0649257 0.3266488 1.0000000 0.2500620 0.5880079 0.6976272 0.0512645 0.0331009 -0.0155739 0.1007144 -0.0037918 -0.0246261 0.0132861 -0.0270607 -0.0072465 0.0776413 0.0066697 -0.0467887 -0.0263715 -0.0426109 -0.0270503 0.0071556 0.0285160 -0.0154089 -0.0328044
STK_06 STK_06 -0.0425563 -0.0101158 0.0212012 0.0347316 0.1870897 0.1579604 0.0905514 0.1045092 0.1930126 0.1802407 0.0153703 0.0440800 0.2063053 0.2020601 0.1646616 0.2088467 0.1718119 0.1065507 0.1450104 0.2738285 0.2096414 0.0284015 0.0437557 0.2315523 0.2442083 -0.0925059 0.0831624 0.0900367 0.0948584 0.0158359 0.0057888 0.0401311 0.0285800 0.1347354 0.1245526 0.0083388 -0.0156808 0.2318112 0.0631441 0.8021725 0.2500620 1.0000000 0.4774227 0.5132722 -0.0207810 -0.0497188 -0.0477381 -0.0086007 -0.0990850 -0.0787235 -0.0203047 -0.0479955 -0.0035153 -0.0646189 0.0144757 -0.0298356 -0.0454564 0.0343065 -0.0293121 0.0542351 0.0085622 -0.0284310 -0.1296356
VTE_1 VTE_1 0.0039309 0.0654668 0.0866767 0.0586463 -0.0859359 -0.0887946 -0.0996751 -0.0999738 0.0235381 -0.0364693 -0.0848392 -0.0897193 -0.0348377 0.0037855 -0.0552836 -0.0544304 -0.0972506 -0.1148325 -0.1112139 0.0383041 -0.0268317 -0.0880614 -0.0886019 -0.0248375 0.0058311 0.0761139 0.0241622 0.0906329 0.2344307 0.1235925 0.0805708 0.2485615 0.1567526 -0.0563373 0.2303117 0.0443562 0.1359384 0.2111767 0.2522518 0.4631419 0.5880079 0.4774227 1.0000000 0.8736490 -0.0526925 -0.0493931 -0.0282911 -0.1051171 -0.1710235 -0.1238916 -0.0378500 -0.1180160 -0.0262166 -0.0522876 -0.0577213 -0.0037803 -0.0301775 -0.0317534 -0.0440578 0.0086141 0.0326658 -0.0475558 -0.1256363
VTE_2 VTE_2 0.0707936 0.1064994 0.0781368 0.0734882 -0.0029321 -0.0231504 -0.0477865 -0.0151359 0.0288319 0.0393084 -0.0027352 -0.0754020 0.0691782 0.0981665 0.0051809 0.0173084 -0.0314520 -0.0405641 -0.0255467 0.0559072 0.0534239 -0.0051760 -0.0634120 0.0775236 0.1184210 0.0321584 -0.0151698 0.0058144 0.0629824 0.0408574 -0.0317460 0.1664553 0.0271825 0.1913961 0.2029424 0.0597759 0.1422354 0.2098349 0.2083065 0.4148180 0.6976272 0.5132722 0.8736490 1.0000000 -0.0073599 -0.0995881 -0.1188383 -0.1492708 -0.1770482 -0.1181448 -0.0713006 -0.1762469 -0.0469516 -0.1120007 -0.0691325 0.0036343 -0.0247178 -0.0036828 0.0009109 0.0110118 -0.0279567 -0.0666219 0.0217206
Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE 0.4350513 0.3208550 0.0742579 -0.1517362 -0.0761301 -0.0615070 -0.0405313 -0.0450744 -0.0990420 -0.1067242 -0.0571772 -0.0250746 -0.0967251 -0.1246042 -0.0822439 -0.0796721 -0.0680571 -0.0605862 -0.0557335 -0.1078990 -0.1051209 -0.0539829 -0.0341816 -0.1091775 -0.1332048 0.0673396 -0.0510683 -0.0212916 -0.0293698 0.0028501 0.0214909 0.0398944 -0.0096464 -0.0081923 -0.0347567 -0.0068112 -0.0041075 -0.0375621 -0.0109117 -0.0319671 0.0512645 -0.0207810 -0.0526925 -0.0073599 1.0000000 0.0830479 -0.0203930 -0.0007242 0.0241066 0.0211621 0.0498557 0.0038509 0.0505415 0.0577776 0.0540124 0.0813038 0.1279724 0.1458258 0.1334619 0.0498603 0.0433809 0.1604802 0.3410864
Score_MORT_30_AMI Score_MORT_30_AMI 0.0392653 0.0074065 -0.0287661 -0.1650703 -0.0479940 -0.0650654 -0.0177776 -0.0510480 -0.0682603 -0.1098730 -0.0630079 0.0428372 -0.0951559 -0.0981817 -0.0710775 -0.0452192 -0.0565823 -0.0215130 -0.0641464 -0.0733482 -0.1115171 -0.0624749 0.0366437 -0.0952111 -0.1095716 0.0579101 -0.0869890 -0.0165321 -0.0678837 -0.0296718 0.0376811 0.0034495 -0.0600569 -0.0643353 -0.0154393 0.0143010 -0.0631275 -0.0091369 -0.0072140 -0.0518587 0.0331009 -0.0497188 -0.0493931 -0.0995881 0.0830479 1.0000000 0.2498600 0.3407616 0.3309425 0.2222539 0.0415523 0.2105379 0.0885083 0.1010348 0.0889343 0.1066619 0.1037006 0.0492328 0.0467554 0.0454462 0.0297688 0.1129695 0.0591548
Score_MORT_30_COPD Score_MORT_30_COPD -0.0452304 -0.0794948 -0.0683850 -0.0843669 -0.0124762 -0.0163367 0.0302290 -0.0130589 0.0034724 -0.0673445 -0.0352455 0.0722857 -0.0338034 -0.0266626 -0.0009117 -0.0003548 -0.0027657 0.0433887 -0.0091794 0.0081764 -0.0514076 -0.0255442 0.0773892 -0.0230632 -0.0262870 -0.0537002 -0.1128278 -0.0616051 -0.1527290 -0.0977614 -0.0446228 0.0135325 0.0081252 -0.0362573 0.0197865 0.0121194 -0.0279555 0.0251530 0.0299786 -0.0534076 -0.0155739 -0.0477381 -0.0282911 -0.1188383 -0.0203930 0.2498600 1.0000000 0.3844105 0.3710744 0.2038243 -0.0069743 0.1713379 0.0478268 0.0397571 0.0429090 0.0320669 0.0426574 -0.0532586 0.0026944 0.0734846 0.0340007 0.0140214 -0.0406696
Score_MORT_30_HF Score_MORT_30_HF -0.0577013 -0.1067828 -0.0967183 -0.1375901 0.1078488 0.0722015 0.1568119 0.0644025 0.1194712 0.0151872 0.0665845 0.1510853 0.0240132 -0.0005694 0.0947303 0.1228267 0.0845920 0.1663036 0.0704411 0.1281267 0.0259241 0.0677998 0.1553155 0.0300940 -0.0003806 -0.1136600 -0.1245435 -0.0010634 -0.2195933 -0.1375348 -0.0767863 0.0045056 0.0099705 0.0171605 0.0170609 0.0396301 -0.0515598 0.0425397 0.0287674 0.0093937 0.1007144 -0.0086007 -0.1051171 -0.1492708 -0.0007242 0.3407616 0.3844105 1.0000000 0.4479367 0.3147371 0.0371596 0.2556384 0.0679149 0.1051698 0.0707269 0.0383771 0.0362529 -0.0300702 -0.0086832 0.0647245 0.0342374 0.0465081 -0.0350247
Score_MORT_30_PN Score_MORT_30_PN -0.0046351 -0.0985660 -0.1319780 -0.1231721 0.0183801 -0.0021513 0.0384967 -0.0046993 -0.0343594 -0.0880905 0.0009844 0.0902457 -0.0613852 -0.1107731 -0.0145406 0.0156651 0.0121242 0.0511012 -0.0036037 -0.0315360 -0.0829950 -0.0009274 0.0950135 -0.0702915 -0.1100101 -0.0389080 -0.1523779 -0.0761104 -0.1858291 -0.0578145 -0.0559254 -0.0210473 -0.0536184 -0.0204107 -0.0335808 0.0312695 -0.0787945 -0.0246311 -0.0152374 -0.0939059 -0.0037918 -0.0990850 -0.1710235 -0.1770482 0.0241066 0.3309425 0.3710744 0.4479367 1.0000000 0.3042563 0.0303815 0.2301195 0.0543554 0.0884315 0.0217880 0.0237048 0.0704445 0.0089560 0.0393676 0.0464407 0.0029691 0.0661595 -0.0062985
Score_MORT_30_STK Score_MORT_30_STK -0.0025521 -0.0376746 -0.0478311 -0.1345921 -0.0132887 -0.0165143 0.0435642 0.0257903 0.0159051 -0.0653713 -0.0651509 0.0337813 -0.0814552 -0.1033645 -0.0275058 -0.0170135 0.0058744 0.0476323 0.0323273 0.0086356 -0.0564195 -0.0795298 0.0255069 -0.0760295 -0.1074875 0.0005912 -0.0988833 0.0146397 -0.0905644 -0.0463395 0.0198257 -0.0614632 -0.0251654 -0.0804707 -0.0769699 -0.0311801 -0.0854897 -0.0663090 -0.0457369 -0.0905922 -0.0246261 -0.0787235 -0.1238916 -0.1181448 0.0211621 0.2222539 0.2038243 0.3147371 0.3042563 1.0000000 0.0687216 0.2380935 0.0878847 0.1014879 0.0674377 0.0622532 0.0725381 0.0474896 0.0513975 0.0492194 0.0625191 0.1142992 -0.0272101
Score_PSI_03 Score_PSI_03 -0.0058883 -0.0037334 0.0027232 -0.0339593 -0.0219430 -0.0016914 -0.0192053 -0.0221085 0.0012755 -0.0300329 -0.0629368 -0.0457618 -0.0490180 -0.0387585 -0.0283146 -0.0236955 0.0019106 -0.0223315 -0.0128473 0.0049870 -0.0211932 -0.0709592 -0.0454692 -0.0499063 -0.0363290 0.0602674 0.0943953 0.0451508 0.0583644 0.0181970 0.0648778 -0.0570477 -0.0032584 -0.0287158 -0.0797702 -0.0131473 -0.0659226 -0.0624234 -0.0572546 -0.0044494 0.0132861 -0.0203047 -0.0378500 -0.0713006 0.0498557 0.0415523 -0.0069743 0.0371596 0.0303815 0.0687216 1.0000000 0.1353085 0.0601750 0.0636661 0.1407342 0.0386211 0.0114365 0.1186788 0.0298580 0.0596798 0.0999683 0.7496827 0.0086745
Score_PSI_04 Score_PSI_04 0.0129772 -0.0449077 -0.0749480 -0.0798121 -0.0320167 -0.0041080 -0.0016542 -0.0185639 0.0047573 -0.0817158 -0.1272277 -0.0343542 -0.0852573 -0.0857042 -0.0547617 -0.0303324 -0.0063422 -0.0117228 -0.0052418 0.0228481 -0.0638071 -0.1182074 -0.0450913 -0.0948401 -0.1008569 -0.0541503 0.0417007 0.0601831 0.0638412 0.0046272 0.1019307 -0.0326986 0.0312780 -0.0995591 -0.0937443 0.0329404 -0.0727333 -0.0956269 -0.0824732 -0.0746093 -0.0270607 -0.0479955 -0.1180160 -0.1762469 0.0038509 0.2105379 0.1713379 0.2556384 0.2301195 0.2380935 0.1353085 1.0000000 0.0601419 0.0870693 0.1059485 0.0523892 0.0649032 0.0782559 0.0123489 0.0652098 0.1018205 0.1589978 -0.0766302
Score_PSI_06 Score_PSI_06 0.0156305 0.0154891 0.0017920 -0.0499866 -0.0184090 0.0301520 0.0274588 0.0273930 0.0312923 -0.0009316 -0.0390292 -0.0062045 -0.0166476 -0.0221487 0.0061083 -0.0019546 0.0334780 0.0152153 0.0247052 0.0256121 0.0098029 -0.0369162 -0.0062861 -0.0054742 -0.0193730 0.0551291 0.0225916 0.0544576 0.0187794 0.0392571 0.0452860 -0.0515403 0.0059688 -0.0603629 -0.0339321 0.0208887 -0.0738364 -0.0378140 -0.0074196 0.0129386 -0.0072465 -0.0035153 -0.0262166 -0.0469516 0.0505415 0.0885083 0.0478268 0.0679149 0.0543554 0.0878847 0.0601750 0.0601419 1.0000000 0.0724291 0.1014588 0.0516246 0.0351464 0.1431056 0.0509831 0.0527115 0.0910520 0.1455340 0.0456525
Score_PSI_08 Score_PSI_08 0.0386634 -0.0214412 -0.0595834 -0.1384081 0.0046992 -0.0171646 0.0159553 -0.0011887 -0.0120380 -0.0348072 0.0016051 0.0093687 -0.0686735 -0.0800058 -0.0156145 -0.0080622 -0.0028885 0.0187552 -0.0039459 -0.0176853 -0.0381375 0.0027541 0.0051695 -0.0590998 -0.0826240 -0.0211509 -0.0272232 -0.0311579 -0.0806516 -0.0457654 0.0102293 0.0269822 -0.0199006 -0.0017558 -0.0179990 -0.0022962 -0.0368684 -0.0141213 -0.0351047 -0.0292526 0.0776413 -0.0646189 -0.0522876 -0.1120007 0.0577776 0.1010348 0.0397571 0.1051698 0.0884315 0.1014879 0.0636661 0.0870693 0.0724291 1.0000000 0.0052449 -0.0360093 0.0198090 0.0394605 0.0093444 0.0228045 0.0127268 0.0624052 -0.0041983
Score_PSI_09 Score_PSI_09 -0.0459321 -0.0182303 0.0136214 -0.0209899 0.0835767 0.0956403 0.0859168 0.0897240 0.0883277 0.0827573 -0.0069280 0.0220979 0.0710883 0.0787681 0.0913137 0.0845264 0.0956884 0.0790559 0.0793646 0.0859133 0.0825062 -0.0109302 0.0262747 0.0707774 0.0862428 -0.0336614 0.0549990 0.0899361 0.0224833 -0.0128057 0.0426036 -0.0140788 0.0150699 -0.0043396 -0.0245784 0.0293689 -0.0345921 -0.0257851 -0.0587674 0.0219699 0.0066697 0.0144757 -0.0577213 -0.0691325 0.0540124 0.0889343 0.0429090 0.0707269 0.0217880 0.0674377 0.1407342 0.1059485 0.1014588 0.0052449 1.0000000 0.0885278 0.0680540 0.1732337 0.0519119 0.1207438 0.2197254 0.2331017 -0.0237660
Score_PSI_10 Score_PSI_10 0.0099239 0.0710046 0.0894074 -0.0728092 -0.0402520 -0.0267891 -0.0262995 -0.0322016 -0.0190535 -0.0399586 -0.0851994 -0.0051357 -0.0270735 0.0075443 -0.0452172 -0.0270956 -0.0236698 -0.0330533 -0.0452983 -0.0145875 -0.0316508 -0.0753999 -0.0054371 -0.0157615 -0.0105068 0.0315830 0.0009222 0.0412713 0.0544528 0.0031479 0.0343100 -0.0045339 0.0331569 -0.0382369 -0.0390884 -0.0176156 0.0185638 -0.0434036 -0.0265019 -0.0138235 -0.0467887 -0.0298356 -0.0037803 0.0036343 0.0813038 0.1066619 0.0320669 0.0383771 0.0237048 0.0622532 0.0386211 0.0523892 0.0516246 -0.0360093 0.0885278 1.0000000 0.1626632 0.1079488 0.2303938 0.0453739 0.0830134 0.2670390 0.0497447
Score_PSI_11 Score_PSI_11 0.1103031 0.1130121 0.0639191 -0.0689026 -0.1459187 -0.1367919 -0.1292050 -0.1375039 -0.1484276 -0.1668264 -0.1286917 -0.0772394 -0.1585705 -0.1576927 -0.1540619 -0.1428877 -0.1414366 -0.1408040 -0.1557561 -0.1539039 -0.1629600 -0.1233582 -0.0789391 -0.1615833 -0.1710237 0.1430203 -0.0909811 -0.0625676 -0.0076797 0.0074122 0.0397075 0.0279871 -0.0837910 0.0137283 -0.0564170 -0.0367514 -0.0310002 -0.0699820 -0.0400763 -0.0675289 -0.0263715 -0.0454564 -0.0301775 -0.0247178 0.1279724 0.1037006 0.0426574 0.0362529 0.0704445 0.0725381 0.0114365 0.0649032 0.0351464 0.0198090 0.0680540 0.1626632 1.0000000 0.1172504 0.2506376 -0.0093577 0.0464067 0.5858033 0.1441986
Score_PSI_12 Score_PSI_12 0.0939499 0.1047402 0.0654692 -0.0448292 -0.0669148 -0.0390690 -0.0865925 -0.0537461 -0.0612852 -0.0670446 -0.0807406 -0.0730231 -0.0643414 -0.0393015 -0.0738343 -0.0663743 -0.0380979 -0.0953948 -0.0685223 -0.0536226 -0.0614646 -0.0813748 -0.0813685 -0.0674588 -0.0510987 0.1038617 0.1091949 0.0594933 0.1653812 0.0486693 0.0778917 -0.0606628 -0.0108546 -0.0380145 -0.0757690 -0.0072258 -0.0467304 -0.0642500 -0.0886155 0.0183344 -0.0426109 0.0343065 -0.0317534 -0.0036828 0.1458258 0.0492328 -0.0532586 -0.0300702 0.0089560 0.0474896 0.1186788 0.0782559 0.1431056 0.0394605 0.1732337 0.1079488 0.1172504 1.0000000 0.1742084 0.0522204 0.1358951 0.3821290 0.0655557
Score_PSI_13 Score_PSI_13 0.1262199 0.1193336 0.0603447 -0.0832889 -0.1424419 -0.1430591 -0.1330731 -0.1257130 -0.1424055 -0.1593474 -0.1231640 -0.1033157 -0.1472927 -0.1297623 -0.1614933 -0.1459851 -0.1394736 -0.1388091 -0.1386420 -0.1430776 -0.1636005 -0.1261494 -0.1096072 -0.1458284 -0.1364494 0.0817412 -0.0160408 -0.0250714 0.0993570 0.0559662 0.0451515 -0.0384416 -0.0087678 -0.0258097 -0.0254596 0.0030403 -0.0030434 -0.0485933 -0.0494623 -0.0210595 -0.0270503 -0.0293121 -0.0440578 0.0009109 0.1334619 0.0467554 0.0026944 -0.0086832 0.0393676 0.0513975 0.0298580 0.0123489 0.0509831 0.0093444 0.0519119 0.2303938 0.2506376 0.1742084 1.0000000 0.0056987 0.0878105 0.4075564 0.0949467
Score_PSI_14 Score_PSI_14 -0.0138581 0.0140012 0.0328153 -0.0662610 0.0017770 -0.0079719 0.0092188 -0.0064032 -0.0023314 -0.0164980 -0.0459005 -0.0277664 -0.0205437 0.0134380 -0.0003415 -0.0029199 -0.0143285 -0.0063199 -0.0063967 0.0069610 -0.0189605 -0.0505286 -0.0256155 -0.0172231 0.0003983 0.0633107 0.0169512 0.0723872 0.0676972 0.0192234 0.0652307 -0.0593032 0.0163779 -0.0166146 -0.0644595 -0.0148982 -0.0228048 -0.0700502 -0.0377753 0.0392025 0.0071556 0.0542351 0.0086141 0.0110118 0.0498603 0.0454462 0.0734846 0.0647245 0.0464407 0.0492194 0.0596798 0.0652098 0.0527115 0.0228045 0.1207438 0.0453739 -0.0093577 0.0522204 0.0056987 1.0000000 0.1176726 0.0783006 -0.0181150
Score_PSI_15 Score_PSI_15 -0.0019014 -0.0158282 -0.0242059 -0.0665941 0.0329127 0.0359419 0.0480143 0.0443475 0.0662158 0.0357574 -0.0007875 -0.0213211 0.0287970 0.0262619 0.0303266 0.0385911 0.0425664 0.0405341 0.0420468 0.0649921 0.0320446 0.0061946 -0.0196124 0.0304634 0.0331751 -0.0393782 0.0430812 0.0625753 0.0374530 0.0049833 0.0094012 -0.0270588 0.0419068 -0.0268805 -0.0051105 0.0376698 0.0032891 -0.0047427 -0.0179532 -0.0065909 0.0285160 0.0085622 0.0326658 -0.0279567 0.0433809 0.0297688 0.0340007 0.0342374 0.0029691 0.0625191 0.0999683 0.1018205 0.0910520 0.0127268 0.2197254 0.0830134 0.0464067 0.1358951 0.0878105 0.1176726 1.0000000 0.2021298 -0.0467071
Score_PSI_90 Score_PSI_90 0.0882354 0.0973882 0.0626347 -0.0942117 -0.1122411 -0.0871039 -0.1032756 -0.1023631 -0.0911960 -0.1311288 -0.1463724 -0.0977954 -0.1387904 -0.1202753 -0.1263561 -0.1096342 -0.0826332 -0.1131291 -0.1086247 -0.0872152 -0.1202132 -0.1486299 -0.1003157 -0.1390570 -0.1279752 0.1286484 0.0534106 0.0226639 0.0986077 0.0401196 0.0926476 -0.0530033 -0.0312757 -0.0300979 -0.1034135 -0.0243705 -0.0713756 -0.0993958 -0.0875069 -0.0294786 -0.0154089 -0.0284310 -0.0475558 -0.0666219 0.1604802 0.1129695 0.0140214 0.0465081 0.0661595 0.1142992 0.7496827 0.1589978 0.1455340 0.0624052 0.2331017 0.2670390 0.5858033 0.3821290 0.4075564 0.0783006 0.2021298 1.0000000 0.1036455
Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE 0.2740999 0.2975679 0.1808580 -0.1234255 -0.2121841 -0.1893123 -0.1564699 -0.1576660 -0.2089917 -0.1977109 -0.0409475 -0.0612456 -0.2018848 -0.2235957 -0.1981220 -0.2107792 -0.1875503 -0.1799323 -0.1660056 -0.2310210 -0.2069970 -0.0439537 -0.0653730 -0.2108956 -0.2364653 0.1212071 -0.0627505 -0.0720431 -0.0241965 0.0264108 -0.0237710 0.0545394 -0.0815209 -0.0048449 -0.0078309 0.0044851 0.0072647 -0.0154257 -0.0092648 -0.1317922 -0.0328044 -0.1296356 -0.1256363 0.0217206 0.3410864 0.0591548 -0.0406696 -0.0350247 -0.0062985 -0.0272101 0.0086745 -0.0766302 0.0456525 -0.0041983 -0.0237660 0.0497447 0.1441986 0.0655557 0.0949467 -0.0181150 -0.0467071 0.1036455 1.0000000

Preprocessing

Variable Encoding (SE)

Identify Categorical Variables

# Create function to find categorical variables
is_categorical <- function(x) is.factor(x) | is.character(x)

# Apply function to all variables in the dataset
categorical_vars <- sapply(HipKneeClean, is_categorical)

# Print the names of all categorical variables
categorical <- names(HipKneeClean)[categorical_vars]
categorical
## [1] "FacilityId"   "EDV"          "FacilityName" "State"

Dummy encode the EDV column

# Define the encoding mapping (ignore NAs for now)
encoding_map <- c(
  'low' = 1,
  'medium' = 2,
  'high' = 3,
  'very high' = 4
)
# Dummy encoding used due to ordinal nature of this data

# Create a copy of HipKneeClean and name it HipKneeTrain to separate cleaned dataset and the training dataset
HipKneeTrain <- HipKneeClean %>%
  mutate(EDV = recode(EDV, !!!encoding_map))

# Print first 20 rows of EDV column in HipKneeClean and HipKneeTrain to ensure proper encoding
cat("HipKneeClean")
## HipKneeClean
print(head(HipKneeClean$EDV, 20))
##  [1] "high"      "high"      "high"      "low"       "low"       "high"     
##  [7] "low"       "medium"    "low"       "medium"    "low"       "low"      
## [13] "high"      "high"      "very high" "very high" "low"       "high"     
## [19] "low"       "very high"
cat("HipKneeTrain")
## HipKneeTrain
print(head(HipKneeTrain$EDV, 20))
##  [1] 3 3 3 1 1 3 1 2 1 2 1 1 3 3 4 4 1 3 1 4

Encode each state in alphabetical order

# Manually map out each state with their respective code in alphabetical order with a preceding 0 to make data non-ordinal
state_mapping <- c(
  "AL" = "001",
  "AK" = "002",
  "AZ" = "003",
  "AR" = "004",
  "CA" = "005",
  "CO" = "006",
  "CT" = "007",
  "DE" = "008",
  "FL" = "009",
  "GA" = "010",
  "HI" = "011",
  "ID" = "012",
  "IL" = "013",
  "IN" = "014",
  "IA" = "015",
  "KS" = "016",
  "KY" = "017",
  "LA" = "018",
  "ME" = "019",
  "MD" = "020",
  "MA" = "021",
  "MI" = "022",
  "MN" = "023",
  "MS" = "024",
  "MO" = "025",
  "MT" = "026",
  "NE" = "027",
  "NV" = "028",
  "NH" = "029",
  "NJ" = "030",
  "NM" = "031",
  "NY" = "032",
  "NC" = "033",
  "ND" = "034",
  "OH" = "035",
  "OK" = "036",
  "OR" = "037",
  "PA" = "038",
  "RI" = "039",
  "SC" = "040",
  "SD" = "041",
  "TN" = "042",
  "TX" = "043",
  "UT" = "044",
  "VT" = "045",
  "VA" = "046",
  "WA" = "047",
  "WV" = "048",
  "WI" = "049",
  "WY" = "050"
)

# Create new "StateCode" column with the encoded values
HipKneeTrain <- HipKneeTrain %>%
  mutate(StateCode = state_mapping[State])

# Print 100 rows of the "State" and "StateCode" columns to ensure accuracy
print("State and StateCode Columns")
## [1] "State and StateCode Columns"
print(head(HipKneeTrain[c("State", "StateCode")], 100))
## # A tibble: 100 × 2
##    State StateCode
##    <chr> <chr>    
##  1 AL    001      
##  2 AL    001      
##  3 AL    001      
##  4 AL    001      
##  5 AL    001      
##  6 AL    001      
##  7 AL    001      
##  8 AL    001      
##  9 AL    001      
## 10 AL    001      
## # ℹ 90 more rows
# Print all unique values in "StateCode" column to ensure accuracy
print("Unique StateCode Values")
## [1] "Unique StateCode Values"
print(unique(HipKneeTrain$StateCode))
##  [1] "001" "002" "003" "004" "005" "006" "007" "008" NA    "009" "010" "011"
## [13] "012" "013" "014" "015" "016" "017" "018" "019" "020" "021" "022" "023"
## [25] "024" "025" "026" "027" "028" "029" "030" "031" "032" "033" "034" "035"
## [37] "036" "037" "038" "039" "040" "041" "042" "043" "044" "045" "046" "047"
## [49] "048" "049" "050"

Collinearity and Feature Removal (SE)

Remove correlated and unnecessary variables

# Specify columns to remove
columns_to_remove <- c(
  "ED_2_Strata_1",
  "OP_23",
  "VTE_2",
  "OP_18c",
  "OP_22",
  "STK_02",
  "STK_05",
  "STK_06",
  "HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE",
  "NumberOfReadmissions_HIP-KNEE",
  "ExcessReadmissionRatio_HIP-KNEE",
  "ExpectedReadmissionRate_HIP-KNEE",
  "SEP_1",
  "SEV_SEP_6HR",
  "SEV_SEP_3HR",
  "SEP_SH_6HR",
  "SEP_SH_3HR",
  "Score_PSI_90",
  "PatientSurveyStarRating_H_COMP_1_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_2_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_3_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_5_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_6_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_7_STAR_RATING",    
  "PatientSurveyStarRating_H_CLEAN_STAR_RATING",     
  "PatientSurveyStarRating_H_QUIET_STAR_RATING",     
  "PatientSurveyStarRating_H_HSP_RATING_STAR_RATING",
  "PatientSurveyStarRating_H_RECMND_STAR_RATING",    
  "PatientSurveyStarRating_H_STAR_RATING",
  "HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE",     
  "HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE"
)

# Remove specified columns
HipKneeTrain <- HipKneeTrain %>% select(-all_of(columns_to_remove))
# Print column names to verify
print("Remaining columns:")
## [1] "Remaining columns:"
print(colnames(HipKneeTrain))
##  [1] "FacilityId"                                     
##  [2] "PredictedReadmissionRate_HIP_KNEE"              
##  [3] "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE"
##  [4] "EDV"                                            
##  [5] "HCP_COVID_19"                                   
##  [6] "IMM_3"                                          
##  [7] "OP_18b"                                         
##  [8] "OP_29"                                          
##  [9] "SAFE_USE_OF_OPIOIDS"                            
## [10] "VTE_1"                                          
## [11] "Score_COMP_HIP_KNEE"                            
## [12] "Score_MORT_30_AMI"                              
## [13] "Score_MORT_30_COPD"                             
## [14] "Score_MORT_30_HF"                               
## [15] "Score_MORT_30_PN"                               
## [16] "Score_MORT_30_STK"                              
## [17] "Score_PSI_03"                                   
## [18] "Score_PSI_04"                                   
## [19] "Score_PSI_06"                                   
## [20] "Score_PSI_08"                                   
## [21] "Score_PSI_09"                                   
## [22] "Score_PSI_10"                                   
## [23] "Score_PSI_11"                                   
## [24] "Score_PSI_12"                                   
## [25] "Score_PSI_13"                                   
## [26] "Score_PSI_14"                                   
## [27] "Score_PSI_15"                                   
## [28] "FacilityName"                                   
## [29] "State"                                          
## [30] "Payment_PAYM_90_HIP_KNEE"                       
## [31] "StateCode"

“OP_18c” and “OP_22” removed due to being highly correlated and low relevance. “ED_2_Strata_1”, “OP_23”, and “VTE_2” removed due to high percentage of missingness. “STK_02”, “STK_05”, and “STK_06” variables removed as we do not see stroke data as being relevant towards Hip/Knee Surgery. “HcahpsLinearMeanValue_H_RECMND_LINEAR_ SCORE” removed as it is strongly correlated with overall hospital rating. “ExcessReadmissionRatio_HIP-KNEE”, and “ExpectedReadmissionRate_HIP-KNEE” removed due to being highly correlated with the target variable. “NumberOfReadmissions_HIP-KNEE” removed as this would be highly influenced by hospital size and we have no data on hospital sizes. Sepsis variables removed due to unclear definition in the dataset’s dictionary of what the values represent. Score_PSI_90 variable removed because it’s a summary of the other PSI variables. We chose to include all the individual PSI variables, which makes the summary variable redundant. We chose to remove a lot of the patient survey data due to collinearity and redundancy. The average star rating data is redundant with the linear mean score data. We decided to keep the overall hospital rating linear mean score, and the hospital recommendation linear mean score columns. We felt that these variables summarized the other, more granular, metrics. For example COMP-1 is nurse responsiveness and COMP-2 is doctor responsiveness, COMP-3 is staff responsiveness. It makes sense that a lot of these were collinear. We had considered engineering the comp features (1-7) together into a single patient experience variable, however, this was collinear with overall hospital rating and recommendation. We also chose to go with the linear mean score overall hospital rating and recommendation score, because the dataset essentially already scaled these variables for us by performing a linear transformation. It always seems a little scary removing entire chunks of variables, as we wouldn’t want to miss any significant relationships between the variables. Do you think this is a wise decision? Are there any other ideas you could think of to engineer variables in a way to keep more of them?

Reassess collinearity with heatmap and correlation matrix

# Compute correlation matrix
cor_matrix <- cor(HipKneeTrain %>% select_if(is.numeric), use = "pairwise.complete.obs")

# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)

# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")

# Convert correlation matrix to df
cor_table <- as.data.frame(cor_matrix)

# Add variable names as a column
cor_table$Variable <- rownames(cor_table)

# Reorder columns
cor_table <- cor_table %>%
  select(Variable, everything())

# Print table
cor_table %>%
  kable(caption = "Table 8. Correlation Coefficients Table") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 8. Correlation Coefficients Table
Variable PredictedReadmissionRate_HIP_KNEE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE EDV HCP_COVID_19 IMM_3 OP_18b OP_29 SAFE_USE_OF_OPIOIDS VTE_1 Score_COMP_HIP_KNEE Score_MORT_30_AMI Score_MORT_30_COPD Score_MORT_30_HF Score_MORT_30_PN Score_MORT_30_STK Score_PSI_03 Score_PSI_04 Score_PSI_06 Score_PSI_08 Score_PSI_09 Score_PSI_10 Score_PSI_11 Score_PSI_12 Score_PSI_13 Score_PSI_14 Score_PSI_15 Payment_PAYM_90_HIP_KNEE
PredictedReadmissionRate_HIP_KNEE PredictedReadmissionRate_HIP_KNEE 1.0000000 -0.2060912 0.1986939 -0.0563082 -0.0028840 0.1295727 -0.0106510 0.1063002 0.0654668 0.3208550 0.0074065 -0.0794948 -0.1067828 -0.0985660 -0.0376746 -0.0037334 -0.0449077 0.0154891 -0.0214412 -0.0182303 0.0710046 0.1130121 0.1047402 0.1193336 0.0140012 -0.0158282 0.2975679
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE -0.2060912 1.0000000 -0.2262341 0.0302154 0.2182505 -0.2448842 0.0865028 0.1100002 -0.0248375 -0.1091775 -0.0952111 -0.0230632 0.0300940 -0.0702915 -0.0760295 -0.0499063 -0.0948401 -0.0054742 -0.0590998 0.0707774 -0.0157615 -0.1615833 -0.0674588 -0.1458284 -0.0172231 0.0304634 -0.2108956
EDV EDV 0.1986939 -0.2262341 1.0000000 0.1599806 0.0038674 0.5918897 0.0603877 -0.1223889 0.2992859 -0.0240093 -0.0687401 -0.0840621 -0.2588739 -0.1904351 -0.0754281 0.0292819 -0.0438108 -0.0308989 -0.1637017 -0.0181523 0.0837980 0.0455190 0.0749218 0.0724936 0.0399935 0.0144400 0.0053946
HCP_COVID_19 HCP_COVID_19 -0.0563082 0.0302154 0.1599806 1.0000000 0.3203622 0.2574291 0.1067941 -0.0812735 0.0241622 -0.0510683 -0.0869890 -0.1128278 -0.1245435 -0.1523779 -0.0988833 0.0943953 0.0417007 0.0225916 -0.0272232 0.0549990 0.0009222 -0.0909811 0.1091949 -0.0160408 0.0169512 0.0430812 -0.0627505
IMM_3 IMM_3 -0.0028840 0.2182505 0.0038674 0.3203622 1.0000000 0.1105628 0.1317922 0.0410289 0.0906329 -0.0212916 -0.0165321 -0.0616051 -0.0010634 -0.0761104 0.0146397 0.0451508 0.0601831 0.0544576 -0.0311579 0.0899361 0.0412713 -0.0625676 0.0594933 -0.0250714 0.0723872 0.0625753 -0.0720431
OP_18b OP_18b 0.1295727 -0.2448842 0.5918897 0.2574291 0.1105628 1.0000000 0.0506067 -0.1400845 0.2344307 -0.0293698 -0.0678837 -0.1527290 -0.2195933 -0.1858291 -0.0905644 0.0583644 0.0638412 0.0187794 -0.0806516 0.0224833 0.0544528 -0.0076797 0.1653812 0.0993570 0.0676972 0.0374530 -0.0241965
OP_29 OP_29 -0.0106510 0.0865028 0.0603877 0.1067941 0.1317922 0.0506067 1.0000000 -0.0650231 0.1567526 -0.0096464 -0.0600569 0.0081252 0.0099705 -0.0536184 -0.0251654 -0.0032584 0.0312780 0.0059688 -0.0199006 0.0150699 0.0331569 -0.0837910 -0.0108546 -0.0087678 0.0163779 0.0419068 -0.0815209
SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS 0.1063002 0.1100002 -0.1223889 -0.0812735 0.0410289 -0.1400845 -0.0650231 1.0000000 -0.0563373 -0.0081923 -0.0643353 -0.0362573 0.0171605 -0.0204107 -0.0804707 -0.0287158 -0.0995591 -0.0603629 -0.0017558 -0.0043396 -0.0382369 0.0137283 -0.0380145 -0.0258097 -0.0166146 -0.0268805 -0.0048449
VTE_1 VTE_1 0.0654668 -0.0248375 0.2992859 0.0241622 0.0906329 0.2344307 0.1567526 -0.0563373 1.0000000 -0.0526925 -0.0493931 -0.0282911 -0.1051171 -0.1710235 -0.1238916 -0.0378500 -0.1180160 -0.0262166 -0.0522876 -0.0577213 -0.0037803 -0.0301775 -0.0317534 -0.0440578 0.0086141 0.0326658 -0.1256363
Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE 0.3208550 -0.1091775 -0.0240093 -0.0510683 -0.0212916 -0.0293698 -0.0096464 -0.0081923 -0.0526925 1.0000000 0.0830479 -0.0203930 -0.0007242 0.0241066 0.0211621 0.0498557 0.0038509 0.0505415 0.0577776 0.0540124 0.0813038 0.1279724 0.1458258 0.1334619 0.0498603 0.0433809 0.3410864
Score_MORT_30_AMI Score_MORT_30_AMI 0.0074065 -0.0952111 -0.0687401 -0.0869890 -0.0165321 -0.0678837 -0.0600569 -0.0643353 -0.0493931 0.0830479 1.0000000 0.2498600 0.3407616 0.3309425 0.2222539 0.0415523 0.2105379 0.0885083 0.1010348 0.0889343 0.1066619 0.1037006 0.0492328 0.0467554 0.0454462 0.0297688 0.0591548
Score_MORT_30_COPD Score_MORT_30_COPD -0.0794948 -0.0230632 -0.0840621 -0.1128278 -0.0616051 -0.1527290 0.0081252 -0.0362573 -0.0282911 -0.0203930 0.2498600 1.0000000 0.3844105 0.3710744 0.2038243 -0.0069743 0.1713379 0.0478268 0.0397571 0.0429090 0.0320669 0.0426574 -0.0532586 0.0026944 0.0734846 0.0340007 -0.0406696
Score_MORT_30_HF Score_MORT_30_HF -0.1067828 0.0300940 -0.2588739 -0.1245435 -0.0010634 -0.2195933 0.0099705 0.0171605 -0.1051171 -0.0007242 0.3407616 0.3844105 1.0000000 0.4479367 0.3147371 0.0371596 0.2556384 0.0679149 0.1051698 0.0707269 0.0383771 0.0362529 -0.0300702 -0.0086832 0.0647245 0.0342374 -0.0350247
Score_MORT_30_PN Score_MORT_30_PN -0.0985660 -0.0702915 -0.1904351 -0.1523779 -0.0761104 -0.1858291 -0.0536184 -0.0204107 -0.1710235 0.0241066 0.3309425 0.3710744 0.4479367 1.0000000 0.3042563 0.0303815 0.2301195 0.0543554 0.0884315 0.0217880 0.0237048 0.0704445 0.0089560 0.0393676 0.0464407 0.0029691 -0.0062985
Score_MORT_30_STK Score_MORT_30_STK -0.0376746 -0.0760295 -0.0754281 -0.0988833 0.0146397 -0.0905644 -0.0251654 -0.0804707 -0.1238916 0.0211621 0.2222539 0.2038243 0.3147371 0.3042563 1.0000000 0.0687216 0.2380935 0.0878847 0.1014879 0.0674377 0.0622532 0.0725381 0.0474896 0.0513975 0.0492194 0.0625191 -0.0272101
Score_PSI_03 Score_PSI_03 -0.0037334 -0.0499063 0.0292819 0.0943953 0.0451508 0.0583644 -0.0032584 -0.0287158 -0.0378500 0.0498557 0.0415523 -0.0069743 0.0371596 0.0303815 0.0687216 1.0000000 0.1353085 0.0601750 0.0636661 0.1407342 0.0386211 0.0114365 0.1186788 0.0298580 0.0596798 0.0999683 0.0086745
Score_PSI_04 Score_PSI_04 -0.0449077 -0.0948401 -0.0438108 0.0417007 0.0601831 0.0638412 0.0312780 -0.0995591 -0.1180160 0.0038509 0.2105379 0.1713379 0.2556384 0.2301195 0.2380935 0.1353085 1.0000000 0.0601419 0.0870693 0.1059485 0.0523892 0.0649032 0.0782559 0.0123489 0.0652098 0.1018205 -0.0766302
Score_PSI_06 Score_PSI_06 0.0154891 -0.0054742 -0.0308989 0.0225916 0.0544576 0.0187794 0.0059688 -0.0603629 -0.0262166 0.0505415 0.0885083 0.0478268 0.0679149 0.0543554 0.0878847 0.0601750 0.0601419 1.0000000 0.0724291 0.1014588 0.0516246 0.0351464 0.1431056 0.0509831 0.0527115 0.0910520 0.0456525
Score_PSI_08 Score_PSI_08 -0.0214412 -0.0590998 -0.1637017 -0.0272232 -0.0311579 -0.0806516 -0.0199006 -0.0017558 -0.0522876 0.0577776 0.1010348 0.0397571 0.1051698 0.0884315 0.1014879 0.0636661 0.0870693 0.0724291 1.0000000 0.0052449 -0.0360093 0.0198090 0.0394605 0.0093444 0.0228045 0.0127268 -0.0041983
Score_PSI_09 Score_PSI_09 -0.0182303 0.0707774 -0.0181523 0.0549990 0.0899361 0.0224833 0.0150699 -0.0043396 -0.0577213 0.0540124 0.0889343 0.0429090 0.0707269 0.0217880 0.0674377 0.1407342 0.1059485 0.1014588 0.0052449 1.0000000 0.0885278 0.0680540 0.1732337 0.0519119 0.1207438 0.2197254 -0.0237660
Score_PSI_10 Score_PSI_10 0.0710046 -0.0157615 0.0837980 0.0009222 0.0412713 0.0544528 0.0331569 -0.0382369 -0.0037803 0.0813038 0.1066619 0.0320669 0.0383771 0.0237048 0.0622532 0.0386211 0.0523892 0.0516246 -0.0360093 0.0885278 1.0000000 0.1626632 0.1079488 0.2303938 0.0453739 0.0830134 0.0497447
Score_PSI_11 Score_PSI_11 0.1130121 -0.1615833 0.0455190 -0.0909811 -0.0625676 -0.0076797 -0.0837910 0.0137283 -0.0301775 0.1279724 0.1037006 0.0426574 0.0362529 0.0704445 0.0725381 0.0114365 0.0649032 0.0351464 0.0198090 0.0680540 0.1626632 1.0000000 0.1172504 0.2506376 -0.0093577 0.0464067 0.1441986
Score_PSI_12 Score_PSI_12 0.1047402 -0.0674588 0.0749218 0.1091949 0.0594933 0.1653812 -0.0108546 -0.0380145 -0.0317534 0.1458258 0.0492328 -0.0532586 -0.0300702 0.0089560 0.0474896 0.1186788 0.0782559 0.1431056 0.0394605 0.1732337 0.1079488 0.1172504 1.0000000 0.1742084 0.0522204 0.1358951 0.0655557
Score_PSI_13 Score_PSI_13 0.1193336 -0.1458284 0.0724936 -0.0160408 -0.0250714 0.0993570 -0.0087678 -0.0258097 -0.0440578 0.1334619 0.0467554 0.0026944 -0.0086832 0.0393676 0.0513975 0.0298580 0.0123489 0.0509831 0.0093444 0.0519119 0.2303938 0.2506376 0.1742084 1.0000000 0.0056987 0.0878105 0.0949467
Score_PSI_14 Score_PSI_14 0.0140012 -0.0172231 0.0399935 0.0169512 0.0723872 0.0676972 0.0163779 -0.0166146 0.0086141 0.0498603 0.0454462 0.0734846 0.0647245 0.0464407 0.0492194 0.0596798 0.0652098 0.0527115 0.0228045 0.1207438 0.0453739 -0.0093577 0.0522204 0.0056987 1.0000000 0.1176726 -0.0181150
Score_PSI_15 Score_PSI_15 -0.0158282 0.0304634 0.0144400 0.0430812 0.0625753 0.0374530 0.0419068 -0.0268805 0.0326658 0.0433809 0.0297688 0.0340007 0.0342374 0.0029691 0.0625191 0.0999683 0.1018205 0.0910520 0.0127268 0.2197254 0.0830134 0.0464067 0.1358951 0.0878105 0.1176726 1.0000000 -0.0467071
Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE 0.2975679 -0.2108956 0.0053946 -0.0627505 -0.0720431 -0.0241965 -0.0815209 -0.0048449 -0.1256363 0.3410864 0.0591548 -0.0406696 -0.0350247 -0.0062985 -0.0272101 0.0086745 -0.0766302 0.0456525 -0.0041983 -0.0237660 0.0497447 0.1441986 0.0655557 0.0949467 -0.0181150 -0.0467071 1.0000000

Imputation and Handling of Missing Values (AC)

# Remove all NA values in target variable "PredictedReadmissionRate_HIP_KNEE"
HipKneeTrain <- HipKneeTrain %>% filter(!is.na(PredictedReadmissionRate_HIP_KNEE))

# Remove all NA values in the "State", "StateCode", and "FacilityName" columns
HipKneeTrain <- HipKneeTrain %>% drop_na(State, StateCode, FacilityName)


# Print number of remaining variables and observations
dimensions <- dim(HipKneeTrain)
cat("Number of variables:", dimensions[2], "\n")
## Number of variables: 31
cat("Number of observations:", dimensions[1], "\n")
## Number of observations: 1833

We decided to remove the one facility that had an NA value, which also happened to be the same observation with a missing state value.

# Calculate missing values
missing_values_summary <- HipKneeTrain %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTrain)) * 100)

# Print table
missing_values_summary %>%
  kable(caption = "Table 7. Missing Values Summary") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 7. Missing Values Summary
Variable Missing_Count Missing_Percentage
FacilityId 0 0.0000000
PredictedReadmissionRate_HIP_KNEE 0 0.0000000
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 33 1.8003273
EDV 90 4.9099836
HCP_COVID_19 16 0.8728860
IMM_3 16 0.8728860
OP_18b 75 4.0916530
OP_29 222 12.1112930
SAFE_USE_OF_OPIOIDS 69 3.7643208
VTE_1 994 54.2280415
Score_COMP_HIP_KNEE 40 2.1822149
Score_MORT_30_AMI 405 22.0949264
Score_MORT_30_COPD 247 13.4751773
Score_MORT_30_HF 141 7.6923077
Score_MORT_30_PN 125 6.8194217
Score_MORT_30_STK 284 15.4937261
Score_PSI_03 8 0.4364430
Score_PSI_04 575 31.3693399
Score_PSI_06 2 0.1091107
Score_PSI_08 2 0.1091107
Score_PSI_09 2 0.1091107
Score_PSI_10 41 2.2367703
Score_PSI_11 40 2.1822149
Score_PSI_12 2 0.1091107
Score_PSI_13 42 2.2913257
Score_PSI_14 87 4.7463175
Score_PSI_15 29 1.5821058
FacilityName 0 0.0000000
State 0 0.0000000
Payment_PAYM_90_HIP_KNEE 42 2.2913257
StateCode 0 0.0000000

Impute variables with low percentage missingness (<5%) by the median for numeric variables

# Calculate median for columns with <5% missing values
numeric_vars_low_missing <- c("HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "EDV", "HCP_COVID_19", "IMM_3", "OP_18b", "SAFE_USE_OF_OPIOIDS", "Score_COMP_HIP_KNEE", "Score_PSI_03", "Score_PSI_06", "Score_PSI_08", "Score_PSI_09", "Score_PSI_10", "Score_PSI_11", "Score_PSI_12", "Score_PSI_13", "Score_PSI_14", "Score_PSI_15", "Payment_PAYM_90_HIP_KNEE")

for (var in numeric_vars_low_missing) {
  HipKneeTrain[[var]][is.na(HipKneeTrain[[var]])] <- median(HipKneeTrain[[var]], na.rm = TRUE)
}

Impute high percentage missingness variables (>5%) using KNN

# Select high missingness variables for KNN imputation
vars_for_knn <- c("VTE_1", "Score_MORT_30_AMI", "Score_MORT_30_COPD", "Score_MORT_30_HF", "Score_MORT_30_PN", "Score_MORT_30_STK", "Score_PSI_04", "OP_29")

# Perform KNN imputation
HipKneeTrain_knn <- kNN(HipKneeTrain, variable = vars_for_knn, k = 5)

# Remove columns created by the KNN function
HipKneeTrain_knn <- HipKneeTrain_knn %>% select(-ends_with("_imp"))

# Update HipKneeTrain with imputed values
HipKneeTrain[vars_for_knn] <- HipKneeTrain_knn[vars_for_knn]

Is this a good method for imputing missing values? We decided that many of our variables had very low missingness percentage, <1%. So, Median imputation would be fine in this case. For the few variables that had higher missingness we went with KNN imputation. Do you have any suggestions or ideas that would be more appropriate here?

# Calculate missing values
missing_values_summary <- HipKneeTrain %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTrain)) * 100)

# Print table
missing_values_summary %>%
  kable(caption = "Table 7. Missing Values Summary") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 7. Missing Values Summary
Variable Missing_Count Missing_Percentage
FacilityId 0 0
PredictedReadmissionRate_HIP_KNEE 0 0
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 0 0
EDV 0 0
HCP_COVID_19 0 0
IMM_3 0 0
OP_18b 0 0
OP_29 0 0
SAFE_USE_OF_OPIOIDS 0 0
VTE_1 0 0
Score_COMP_HIP_KNEE 0 0
Score_MORT_30_AMI 0 0
Score_MORT_30_COPD 0 0
Score_MORT_30_HF 0 0
Score_MORT_30_PN 0 0
Score_MORT_30_STK 0 0
Score_PSI_03 0 0
Score_PSI_04 0 0
Score_PSI_06 0 0
Score_PSI_08 0 0
Score_PSI_09 0 0
Score_PSI_10 0 0
Score_PSI_11 0 0
Score_PSI_12 0 0
Score_PSI_13 0 0
Score_PSI_14 0 0
Score_PSI_15 0 0
FacilityName 0 0
State 0 0
Payment_PAYM_90_HIP_KNEE 0 0
StateCode 0 0

Feature Engineer Mortality Data (AC)

# Average death rates amongst mortality variables and create new column "Score_Ovr_MORT"
HipKneeTrain$Score_Ovr_MORT <- rowMeans(HipKneeTrain[, c("Score_MORT_30_AMI", 
                                                         "Score_MORT_30_COPD", 
                                                         "Score_MORT_30_HF", 
                                                         "Score_MORT_30_PN", 
                                                         "Score_MORT_30_STK")], 
                                                          na.rm = TRUE)

# Remove old mortality columns
HipKneeTrain <- HipKneeTrain[, !(names(HipKneeTrain) %in% c("Score_MORT_30_AMI", 
                                                            "Score_MORT_30_COPD",
                                                            "Score_MORT_30_HF", 
                                                            "Score_MORT_30_PN", 
                                                            "Score_MORT_30_STK"))]

Reassess heatmap with engineered mortality data

# Compute correlation matrix
cor_matrix <- cor(HipKneeTrain %>% select_if(is.numeric), use = "pairwise.complete.obs")

# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)

# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")

Preprocessing the test dataset (AC)

We are utilizing the most recent snapshot from 04/24/2024 as our test set. Utilizing this brand new data will help to ensure that our model is generalizable and useful for future analyses.

Loading the Data

# Set the directory for the data files
filepath <- "/Users/adelinecasali/Desktop/hospitals_04_2024/" 

# List the files in the directory that have "Hospital.csv"
files <- list.files(path = filepath, pattern = "Hospital.csv")

# Iterate through each file in the list
for(f in 1:length(files)) {
  
# Read the CSV, clean column names to upper camel case, and store in "dat"
    dat <- clean_names(read_csv(paste0(filepath, files[f]),
                                show_col_types = FALSE), 
                       case = "upper_camel")
    
# Remove ".Hospital.csv" part of the file names to create variable name
    filename <- gsub(".Hospital\\.csv", "", files[f])
    
# Assign data to a variable with the above created name
    assign(filename, dat)
}
# Create a df of file names without ".Hospital.csv"
files <- gsub(".Hospital\\.csv", "", files) %>% data.frame()

# Set column name of the df to "File Name"
names(files) <- "File Name"

files %>% 
  kable(
    format = "html",
    caption = "Table 1. List of hospital-level data files.") %>%
    kable_styling(bootstrap_options = c("striped", full_width = F)
  )
Table 1. List of hospital-level data files.
File Name
Complications_and_Deaths
FY_2024_HAC_Reduction_Program
FY_2024_Hospital_Readmissions_Reduction_Program
HCAHPS
Healthcare_Associated_Infections
Maternal_Health
Medicare_Hospital_Spending_Per_Patient
Outpatient_Imaging_Efficiency
Payment_and_Value_of_Care
Timely_and_Effective_Care
Unplanned_Hospital_Visits

Exploring and Preprocessing the FY_2024_Hospital_Readmissions_Reduction_Program dataset (AC)

Viewing and checking for missing values

# Display first 10 rows of FY_2024_Hospital_Readmissions_Reduction_Program 
head(FY_2024_Hospital_Readmissions_Reduction_Program,10)
## # A tibble: 10 × 12
##    FacilityName         FacilityId State MeasureName NumberOfDischarges Footnote
##    <chr>                <chr>      <chr> <chr>       <chr>                 <dbl>
##  1 SOUTHEAST HEALTH ME… 010001     AL    READM-30-H… N/A                      NA
##  2 SOUTHEAST HEALTH ME… 010001     AL    READM-30-H… 616                      NA
##  3 SOUTHEAST HEALTH ME… 010001     AL    READM-30-A… 274                      NA
##  4 SOUTHEAST HEALTH ME… 010001     AL    READM-30-P… 404                      NA
##  5 SOUTHEAST HEALTH ME… 010001     AL    READM-30-C… 126                      NA
##  6 SOUTHEAST HEALTH ME… 010001     AL    READM-30-C… 117                      NA
##  7 MARSHALL MEDICAL CE… 010005     AL    READM-30-A… N/A                       1
##  8 MARSHALL MEDICAL CE… 010005     AL    READM-30-C… 137                      NA
##  9 MARSHALL MEDICAL CE… 010005     AL    READM-30-P… 285                      NA
## 10 MARSHALL MEDICAL CE… 010005     AL    READM-30-H… 129                      NA
## # ℹ 6 more variables: ExcessReadmissionRatio <chr>,
## #   PredictedReadmissionRate <chr>, ExpectedReadmissionRate <chr>,
## #   NumberOfReadmissions <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## Footnote 
##    12077

Replacing values with NA and “Too Few to Report” values with “5”

# Use the function "replace_with_na_all()" to replace aberrant values with NA
FY_2024_Hospital_Readmissions_Reduction_Program <- replace_with_na_all(FY_2024_Hospital_Readmissions_Reduction_Program, condition = ~ .x == "N/A")

# Replace "Too Few to Report" values with "5" in using gsub
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- gsub("Too Few to Report", "5", FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)

# Check first 10 rows to confirm that it worked
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions, 10)
##  [1] "5"   "149" "32"  "68"  "11"  "20"  NA    "14"  "40"  "24"
# NumberOfReadmissions had to be converted to numeric before applying integers
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions <- as.numeric(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions)

# Find all values of "5" in NumberOfReadmissions
fives <- which(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions == 5)

# Replace values of "5" with random integers from 1 - 10
FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions[fives] <- sample(1:10, length(fives), replace = TRUE)

# Check the first 20 rows to see if this was applied correctly
head(FY_2024_Hospital_Readmissions_Reduction_Program$NumberOfReadmissions,20)
##  [1]   3 149  32  68  11  20  NA  14  40  24   1  NA   7  21  15  83  36  75   5
## [20]  NA

Converting columns to numeric

# Selecting the columns to convert
columns_to_convert <- c("NumberOfDischarges", "ExcessReadmissionRatio", "PredictedReadmissionRate", "ExpectedReadmissionRate", "NumberOfReadmissions")

# Use mutate_at to convert the specified columns to numeric
FY_2024_Hospital_Readmissions_Reduction_Program <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  mutate_at(vars(one_of(columns_to_convert)), as.numeric)

# Print the structure of the dataframe to check the changes
str(FY_2024_Hospital_Readmissions_Reduction_Program)
## tibble [18,774 × 12] (S3: tbl_df/tbl/data.frame)
##  $ FacilityName            : chr [1:18774] "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" "SOUTHEAST HEALTH MEDICAL CENTER" ...
##  $ FacilityId              : chr [1:18774] "010001" "010001" "010001" "010001" ...
##  $ State                   : chr [1:18774] "AL" "AL" "AL" "AL" ...
##  $ MeasureName             : chr [1:18774] "READM-30-HIP-KNEE-HRRP" "READM-30-HF-HRRP" "READM-30-AMI-HRRP" "READM-30-PN-HRRP" ...
##  $ NumberOfDischarges      : num [1:18774] NA 616 274 404 126 117 NA 137 285 129 ...
##  $ Footnote                : num [1:18774] NA NA NA NA NA NA 1 NA NA NA ...
##  $ ExcessReadmissionRatio  : num [1:18774] 0.892 1.1 0.933 0.987 0.952 ...
##  $ PredictedReadmissionRate: num [1:18774] 3.53 23.13 12.9 17.05 9.81 ...
##  $ ExpectedReadmissionRate : num [1:18774] 3.96 21.02 13.83 17.28 10.31 ...
##  $ NumberOfReadmissions    : num [1:18774] 3 149 32 68 11 20 NA 14 40 24 ...
##  $ StartDate               : chr [1:18774] "07/01/2019" "07/01/2019" "07/01/2019" "07/01/2019" ...
##  $ EndDate                 : chr [1:18774] "06/30/2022" "06/30/2022" "06/30/2022" "06/30/2022" ...

Removing excess text from measure names

FY_2024_Hospital_Readmissions_Reduction_Program <-  FY_2024_Hospital_Readmissions_Reduction_Program %>%
  mutate(MeasureName = gsub("READM-30-", "", MeasureName)) %>% 
  mutate(MeasureName = gsub("-HRRP", "", MeasureName)) 

Pivoting the data wider

readmissionsClean <- FY_2024_Hospital_Readmissions_Reduction_Program %>%
  pivot_wider(
    names_from = MeasureName, 
    values_from = c(NumberOfDischarges, ExcessReadmissionRatio, PredictedReadmissionRate, ExpectedReadmissionRate, NumberOfReadmissions), 
    id_cols = c(FacilityName, FacilityId, State, StartDate, EndDate)
  )

# Check the new dataframe
dim(readmissionsClean)
## [1] 3129   35
head(readmissionsClean)
## # A tibble: 6 × 35
##   FacilityName         FacilityId State StartDate EndDate NumberOfDischarges_H…¹
##   <chr>                <chr>      <chr> <chr>     <chr>                    <dbl>
## 1 SOUTHEAST HEALTH ME… 010001     AL    07/01/20… 06/30/…                     NA
## 2 MARSHALL MEDICAL CE… 010005     AL    07/01/20… 06/30/…                     NA
## 3 NORTH ALABAMA MEDIC… 010006     AL    07/01/20… 06/30/…                     NA
## 4 MIZELL MEMORIAL HOS… 010007     AL    07/01/20… 06/30/…                     NA
## 5 CRENSHAW COMMUNITY … 010008     AL    07/01/20… 06/30/…                     NA
## 6 ST. VINCENT'S EAST   010011     AL    07/01/20… 06/30/…                     NA
## # ℹ abbreviated name: ¹​`NumberOfDischarges_HIP-KNEE`
## # ℹ 29 more variables: NumberOfDischarges_HF <dbl>,
## #   NumberOfDischarges_AMI <dbl>, NumberOfDischarges_PN <dbl>,
## #   NumberOfDischarges_CABG <dbl>, NumberOfDischarges_COPD <dbl>,
## #   `ExcessReadmissionRatio_HIP-KNEE` <dbl>, ExcessReadmissionRatio_HF <dbl>,
## #   ExcessReadmissionRatio_AMI <dbl>, ExcessReadmissionRatio_PN <dbl>,
## #   ExcessReadmissionRatio_CABG <dbl>, ExcessReadmissionRatio_COPD <dbl>, …

Filtering for only hip/knee conditions

readmissionsClean <- readmissionsClean %>%
  select(FacilityName, FacilityId, State, matches("HIP-KNEE$"))

Exploring and Preprocessing the HCAHPS dataset

Viewing and checking for missing values

# Display first 10 rows of HCAHPS 
head(HCAHPS,10)
## # A tibble: 10 × 22
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 15 more variables: TelephoneNumber <chr>, HcahpsMeasureId <chr>,
## #   HcahpsQuestion <chr>, HcahpsAnswerDescription <chr>,
## #   PatientSurveyStarRating <chr>, PatientSurveyStarRatingFootnote <dbl>,
## #   HcahpsAnswerPercent <chr>, HcahpsAnswerPercentFootnote <chr>,
## #   HcahpsLinearMeanValue <chr>, NumberOfCompletedSurveys <chr>,
## #   NumberOfCompletedSurveysFootnote <chr>, SurveyResponseRatePercent <chr>,
## #   SurveyResponseRatePercentFootnote <chr>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- HCAHPS %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## PatientSurveyStarRatingFootnote 
##                          430641

Removing footnote columns and replacing NA values

# Removing all footnote columns
HCAHPS <- HCAHPS %>%
  select(-ends_with("footnote"))

# Replacing all "Not Applicable" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
HCAHPS <- as.data.frame(sapply(HCAHPS, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Pivoting the data wider

HCAHPSClean <- HCAHPS %>%
  pivot_wider(
    names_from = HcahpsMeasureId, 
    values_from = c(PatientSurveyStarRating, HcahpsAnswerPercent, HcahpsLinearMeanValue, SurveyResponseRatePercent), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(HCAHPSClean)
## [1] 4814  375
head(HCAHPSClean)
## # A tibble: 6 × 375
##   FacilityName    FacilityId State PatientSurveyStarRat…¹ PatientSurveyStarRat…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    <NA>                   <NA>                  
## 2 MARSHALL MEDIC… 010005     AL    <NA>                   <NA>                  
## 3 NORTH ALABAMA … 010006     AL    <NA>                   <NA>                  
## 4 MIZELL MEMORIA… 010007     AL    <NA>                   <NA>                  
## 5 CRENSHAW COMMU… 010008     AL    <NA>                   <NA>                  
## 6 ST. VINCENT'S … 010011     AL    <NA>                   <NA>                  
## # ℹ abbreviated names: ¹​PatientSurveyStarRating_H_COMP_1_A_P,
## #   ²​PatientSurveyStarRating_H_COMP_1_SN_P
## # ℹ 370 more variables: PatientSurveyStarRating_H_COMP_1_U_P <chr>,
## #   PatientSurveyStarRating_H_COMP_1_LINEAR_SCORE <chr>,
## #   PatientSurveyStarRating_H_COMP_1_STAR_RATING <chr>,
## #   PatientSurveyStarRating_H_NURSE_RESPECT_A_P <chr>,
## #   PatientSurveyStarRating_H_NURSE_RESPECT_SN_P <chr>, …

Exploring and Preprocessing the Timely_and_Effective_Care dataset

Viewing and checking for missing values

# Display first 10 rows of Timely_and_Effective_Care
head(Timely_and_Effective_Care,10)
## # A tibble: 10 × 16
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 9 more variables: TelephoneNumber <chr>, Condition <chr>, MeasureId <chr>,
## #   MeasureName <chr>, Score <chr>, Sample <chr>, Footnote <chr>,
## #   StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Timely_and_Effective_Care %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()

Replacing NA values

# Replacing all "Not Applicable" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Timely_and_Effective_Care <- as.data.frame(sapply(Timely_and_Effective_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Pivoting the data wider

careClean <- Timely_and_Effective_Care %>%
  pivot_wider(
    names_from = MeasureId, 
    values_from = c(Score), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(careClean)
## [1] 4677   26
head(careClean)
## # A tibble: 6 × 26
##   FacilityName   FacilityId State EDV   ED_2_Strata_1 ED_2_Strata_2 HCP_COVID_19
##   <chr>          <chr>      <chr> <chr> <chr>         <chr>         <chr>       
## 1 SOUTHEAST HEA… 010001     AL    high  <NA>          <NA>          80.7        
## 2 MARSHALL MEDI… 010005     AL    high  148           105           79.8        
## 3 NORTH ALABAMA… 010006     AL    high  <NA>          <NA>          79          
## 4 MIZELL MEMORI… 010007     AL    low   <NA>          <NA>          57.9        
## 5 CRENSHAW COMM… 010008     AL    low   <NA>          <NA>          81.2        
## 6 ST. VINCENT'S… 010011     AL    high  <NA>          <NA>          88          
## # ℹ 19 more variables: IMM_3 <chr>, OP_18b <chr>, OP_18c <chr>, OP_22 <chr>,
## #   OP_23 <chr>, OP_29 <chr>, OP_31 <chr>, SAFE_USE_OF_OPIOIDS <chr>,
## #   SEP_1 <chr>, SEP_SH_3HR <chr>, SEP_SH_6HR <chr>, SEV_SEP_3HR <chr>,
## #   SEV_SEP_6HR <chr>, STK_02 <chr>, STK_03 <chr>, STK_05 <chr>, STK_06 <chr>,
## #   VTE_1 <chr>, VTE_2 <chr>

Exploring and Preprocessing the Complications_and_Deaths dataset

Viewing and checking for missing values

# Display first 10 rows of Complications_and_Deaths
head(Complications_and_Deaths,10)
## # A tibble: 10 × 18
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  6 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  7 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  8 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  9 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## 10 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
## # ℹ 11 more variables: TelephoneNumber <chr>, MeasureId <chr>,
## #   MeasureName <chr>, ComparedToNational <chr>, Denominator <chr>,
## #   Score <chr>, LowerEstimate <chr>, HigherEstimate <chr>, Footnote <chr>,
## #   StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Complications_and_Deaths %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
## named list()

Replacing NA values

# Replacing all "Not Applicable" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Complications_and_Deaths <- as.data.frame(sapply(Complications_and_Deaths, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Pivoting the data wider

deathsClean <- Complications_and_Deaths %>%
  pivot_wider(
    names_from = MeasureId, 
    values_from = c(ComparedToNational, Score), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(deathsClean)
## [1] 4814   41
head(deathsClean)
## # A tibble: 6 × 41
##   FacilityName    FacilityId State ComparedToNational_C…¹ ComparedToNational_M…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005     AL    No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006     AL    No Different Than the… Worse Than the Nation…
## 4 MIZELL MEMORIA… 010007     AL    Number of Cases Too S… Number of Cases Too S…
## 5 CRENSHAW COMMU… 010008     AL    <NA>                   Number of Cases Too S…
## 6 ST. VINCENT'S … 010011     AL    No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹​ComparedToNational_COMP_HIP_KNEE,
## #   ²​ComparedToNational_MORT_30_AMI
## # ℹ 36 more variables: ComparedToNational_MORT_30_CABG <chr>,
## #   ComparedToNational_MORT_30_COPD <chr>, ComparedToNational_MORT_30_HF <chr>,
## #   ComparedToNational_MORT_30_PN <chr>, ComparedToNational_MORT_30_STK <chr>,
## #   ComparedToNational_PSI_03 <chr>, ComparedToNational_PSI_04 <chr>,
## #   ComparedToNational_PSI_06 <chr>, ComparedToNational_PSI_08 <chr>, …

Exploring and Preprocessing the Payment_and_Value_of_Care dataset

Viewing and checking for missing values

# Display first 10 rows of Payment_and_Value_of_Care
head(Payment_and_Value_of_Care,10)
## # A tibble: 10 × 22
##    FacilityId FacilityName           Address CityTown State ZipCode CountyParish
##    <chr>      <chr>                  <chr>   <chr>    <chr> <chr>   <chr>       
##  1 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  2 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  3 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  4 010001     SOUTHEAST HEALTH MEDI… 1108 R… DOTHAN   AL    36301   HOUSTON     
##  5 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  6 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  7 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  8 010005     MARSHALL MEDICAL CENT… 2505 U… BOAZ     AL    35957   MARSHALL    
##  9 010006     NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL    35630   LAUDERDALE  
## 10 010006     NORTH ALABAMA MEDICAL… 1701 V… FLORENCE AL    35630   LAUDERDALE  
## # ℹ 15 more variables: TelephoneNumber <chr>, PaymentMeasureId <chr>,
## #   PaymentMeasureName <chr>, PaymentCategory <chr>, Denominator <chr>,
## #   Payment <chr>, LowerEstimate <chr>, HigherEstimate <chr>,
## #   PaymentFootnote <dbl>, ValueOfCareDisplayId <chr>,
## #   ValueOfCareDisplayName <chr>, ValueOfCareCategory <chr>,
## #   ValueOfCareFootnote <dbl>, StartDate <chr>, EndDate <chr>
# Filter dataset to include numeric columns only
num_vars <- Payment_and_Value_of_Care %>%
  select_if(is.numeric)

# Check for missing values
miss_vals <- sapply(num_vars, function(x) sum(is.na(x)))
print(miss_vals)
##     PaymentFootnote ValueOfCareFootnote 
##                9956               10044

Replacing NA values

# Replacing all "Not Applicable" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Applicable"] <- NA
  }
  return(x)
}))

# Replacing all "Not Available" with NA
Payment_and_Value_of_Care <- as.data.frame(sapply(Payment_and_Value_of_Care, function(x) {
  if (is.character(x)) {
    x[x == "Not Available"] <- NA
  }
  return(x)
}))

Pivoting the data wider

paymentClean <- Payment_and_Value_of_Care %>%
  pivot_wider(
    names_from = PaymentMeasureId, 
    values_from = c(PaymentCategory, Payment), 
    id_cols = c(FacilityName, FacilityId, State)
  )

# Check the new dataframe
dim(paymentClean)
## [1] 4645   11
head(paymentClean)
## # A tibble: 6 × 11
##   FacilityName    FacilityId State PaymentCategory_PAYM…¹ PaymentCategory_PAYM…²
##   <chr>           <chr>      <chr> <chr>                  <chr>                 
## 1 SOUTHEAST HEAL… 010001     AL    No Different Than the… No Different Than the…
## 2 MARSHALL MEDIC… 010005     AL    No Different Than the… No Different Than the…
## 3 NORTH ALABAMA … 010006     AL    Greater Than the Nati… No Different Than the…
## 4 MIZELL MEMORIA… 010007     AL    Number of Cases Too S… No Different Than the…
## 5 CRENSHAW COMMU… 010008     AL    Number of Cases Too S… Number of Cases Too S…
## 6 ST. VINCENT'S … 010011     AL    No Different Than the… No Different Than the…
## # ℹ abbreviated names: ¹​PaymentCategory_PAYM_30_AMI,
## #   ²​PaymentCategory_PAYM_30_HF
## # ℹ 6 more variables: PaymentCategory_PAYM_30_PN <chr>,
## #   PaymentCategory_PAYM_90_HIP_KNEE <chr>, Payment_PAYM_30_AMI <chr>,
## #   Payment_PAYM_30_HF <chr>, Payment_PAYM_30_PN <chr>,
## #   Payment_PAYM_90_HIP_KNEE <chr>

Joining and cleaning the datasets

Joining the datasets based on FacilityId

HipKneeCleanTest <- readmissionsClean %>%
  full_join(HCAHPSClean, by = "FacilityId") %>%
  full_join(careClean, by = "FacilityId") %>%
  full_join(deathsClean, by = "FacilityId") %>%
  full_join(paymentClean, by = "FacilityId")

head(HipKneeCleanTest)
## # A tibble: 6 × 451
##   FacilityName.x                  FacilityId State.x NumberOfDischarges_HIP-KN…¹
##   <chr>                           <chr>      <chr>                         <dbl>
## 1 SOUTHEAST HEALTH MEDICAL CENTER 010001     AL                               NA
## 2 MARSHALL MEDICAL CENTERS        010005     AL                               NA
## 3 NORTH ALABAMA MEDICAL CENTER    010006     AL                               NA
## 4 MIZELL MEMORIAL HOSPITAL        010007     AL                               NA
## 5 CRENSHAW COMMUNITY HOSPITAL     010008     AL                               NA
## 6 ST. VINCENT'S EAST              010011     AL                               NA
## # ℹ abbreviated name: ¹​`NumberOfDischarges_HIP-KNEE`
## # ℹ 447 more variables: `ExcessReadmissionRatio_HIP-KNEE` <dbl>,
## #   `PredictedReadmissionRate_HIP-KNEE` <dbl>,
## #   `ExpectedReadmissionRate_HIP-KNEE` <dbl>,
## #   `NumberOfReadmissions_HIP-KNEE` <dbl>, FacilityName.y <chr>, State.y <chr>,
## #   PatientSurveyStarRating_H_COMP_1_A_P <chr>,
## #   PatientSurveyStarRating_H_COMP_1_SN_P <chr>, …

Removing redundant columns

# Removing duplicate columns
HipKneeCleanTest <- HipKneeCleanTest %>%
  select(-matches("\\.(x|y|z|w|v)$"))

Checking for NA Values

# Checking the dimensions
dim(HipKneeCleanTest)

# Count NA values in each column
na_counts <- sapply(HipKneeCleanTest, function(x) sum(is.na(x)))

# View the NA counts
print(na_counts)

Removing columns with more than 80% NA values

# Calculate the percentage of NA values for each column
na_percentage <- sapply(HipKneeCleanTest, function(x) mean(is.na(x)))

# Remove columns where more than 80% of the values are NA
HipKneeCleanTest <- HipKneeCleanTest[, na_percentage <= 0.8]

# Count NA values in each column
na_counts <- sapply(HipKneeCleanTest, function(x) sum(is.na(x)))

# View the NA counts
print(na_counts)

# Check the dimensions
dim(HipKneeCleanTest)

Removing answer percent and survey response percent columns

# Remove columns containing 'AnswerPercent' or 'SurveyResponseRate'
HipKneeCleanTest <- HipKneeCleanTest %>%
  select(-matches("AnswerPercent|SurveyResponseRate"))

# Check the dimensions
dim(HipKneeCleanTest)
## [1] 4816   87

Removing compared to national columns

# Remove columns containing 'ComparedToNational' and 'PaymentCategory'
HipKneeCleanTest <- HipKneeCleanTest %>%
  select(-matches("ComparedToNational|PaymentCategory"))

# Check the dimensions
dim(HipKneeCleanTest)
## [1] 4816   67

Checking data structure

str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : chr [1:4816] "3" "3" "2" "3" ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : chr [1:4816] "4" "4" "3" "5" ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : chr [1:4816] "3" "2" "2" "4" ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : chr [1:4816] "3" "3" "2" "3" ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : chr [1:4816] "4" "3" "3" "4" ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : chr [1:4816] "3" "2" "1" "2" ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : chr [1:4816] "4" "4" "4" "4" ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : chr [1:4816] "4" "3" "2" "4" ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : chr [1:4816] "4" "3" "2" "4" ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : chr [1:4816] "89" "90" "88" "91" ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : chr [1:4816] "91" "92" "89" "95" ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : chr [1:4816] "81" "75" "75" "88" ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : chr [1:4816] "77" "76" "71" "77" ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : chr [1:4816] "87" "86" "83" "87" ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : chr [1:4816] "82" "79" "77" "82" ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : chr [1:4816] "84" "80" "74" "80" ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : chr [1:4816] "86" "85" "85" "87" ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : chr [1:4816] "89" "85" "82" "89" ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : chr [1:4816] "90" "83" "79" "88" ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : chr [1:4816] NA "148" NA NA ...
##  $ HCP_COVID_19                                    : chr [1:4816] "80.7" "79.8" "79" "57.9" ...
##  $ IMM_3                                           : chr [1:4816] "95" "80" "67" "53" ...
##  $ OP_18b                                          : chr [1:4816] "215" "147" "177" "130" ...
##  $ OP_18c                                          : chr [1:4816] "317" "266" NA "216" ...
##  $ OP_22                                           : chr [1:4816] "5" "3" "1" "4" ...
##  $ OP_23                                           : chr [1:4816] NA NA "69" NA ...
##  $ OP_29                                           : chr [1:4816] "47" "96" "85" "23" ...
##  $ SAFE_USE_OF_OPIOIDS                             : chr [1:4816] "14" "19" "17" NA ...
##  $ SEP_1                                           : chr [1:4816] "66" "74" "56" "86" ...
##  $ SEP_SH_3HR                                      : chr [1:4816] "70" "88" "77" NA ...
##  $ SEP_SH_6HR                                      : chr [1:4816] "100" "91" "81" NA ...
##  $ SEV_SEP_3HR                                     : chr [1:4816] "79" "88" "78" "89" ...
##  $ SEV_SEP_6HR                                     : chr [1:4816] "95" "96" "86" "97" ...
##  $ STK_02                                          : chr [1:4816] "98" "100" "96" NA ...
##  $ STK_05                                          : chr [1:4816] NA "91" NA NA ...
##  $ STK_06                                          : chr [1:4816] NA NA "97" NA ...
##  $ VTE_1                                           : chr [1:4816] "98" NA NA NA ...
##  $ VTE_2                                           : chr [1:4816] "99" NA "97" NA ...
##  $ Score_COMP_HIP_KNEE                             : chr [1:4816] "2.7" "2.3" "4.6" NA ...
##  $ Score_MORT_30_AMI                               : chr [1:4816] "12" "13.6" "16.5" NA ...
##  $ Score_MORT_30_COPD                              : chr [1:4816] "8.8" "9.9" "9.9" "13.7" ...
##  $ Score_MORT_30_HF                                : chr [1:4816] "8.9" "14.9" "12.5" "12.5" ...
##  $ Score_MORT_30_PN                                : chr [1:4816] "18" "23.3" "19.5" "28.5" ...
##  $ Score_MORT_30_STK                               : chr [1:4816] "14.8" "15.3" "17.2" NA ...
##  $ Score_PSI_03                                    : chr [1:4816] "0.39" "0.94" "1.39" "0.42" ...
##  $ Score_PSI_04                                    : chr [1:4816] "184.68" "183.49" "173.63" NA ...
##  $ Score_PSI_06                                    : chr [1:4816] "0.23" "0.22" "0.36" "0.24" ...
##  $ Score_PSI_08                                    : chr [1:4816] "0.10" "0.09" "0.08" "0.09" ...
##  $ Score_PSI_09                                    : chr [1:4816] "2.39" "2.69" "5.43" "2.49" ...
##  $ Score_PSI_10                                    : chr [1:4816] "1.14" "1.37" "1.26" "1.57" ...
##  $ Score_PSI_11                                    : chr [1:4816] "13.83" "7.19" "7.37" "8.45" ...
##  $ Score_PSI_12                                    : chr [1:4816] "4.49" "3.01" "3.36" "3.89" ...
##  $ Score_PSI_13                                    : chr [1:4816] "8.05" "4.46" "4.37" "5.19" ...
##  $ Score_PSI_14                                    : chr [1:4816] "1.69" "1.87" "1.76" NA ...
##  $ Score_PSI_15                                    : chr [1:4816] "0.93" "0.91" "1.34" "1.08" ...
##  $ Score_PSI_90                                    : chr [1:4816] "1.21" "0.97" "1.17" "0.95" ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...
# Convert columns to numeric
HipKneeCleanTest <- HipKneeCleanTest %>%
  mutate_at(vars(starts_with("PatientSurveyStarRating_"), 
                 starts_with("HcahpsLinearMeanValue_"), 
                 starts_with("Score_"),
                 starts_with("ED_"),
                 starts_with("IMM_"),
                 starts_with("OP_"),
                 starts_with("SEP_"),
                 starts_with("SEV_"),
                 starts_with("STK_"),
                 starts_with("VTE_"),
                 starts_with("SAFE_"),
                 starts_with("HCP_")),
            ~ as.numeric(as.character(.)))

# View the structure
str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
##  $ HCP_COVID_19                                    : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
##  $ IMM_3                                           : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
##  $ OP_18b                                          : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
##  $ OP_18c                                          : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
##  $ OP_22                                           : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
##  $ OP_23                                           : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
##  $ OP_29                                           : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
##  $ SAFE_USE_OF_OPIOIDS                             : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
##  $ SEP_1                                           : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
##  $ SEP_SH_3HR                                      : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
##  $ SEP_SH_6HR                                      : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
##  $ SEV_SEP_3HR                                     : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
##  $ SEV_SEP_6HR                                     : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
##  $ STK_02                                          : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
##  $ STK_05                                          : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
##  $ STK_06                                          : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
##  $ VTE_1                                           : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
##  $ VTE_2                                           : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
##  $ Score_COMP_HIP_KNEE                             : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
##  $ Score_MORT_30_AMI                               : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
##  $ Score_MORT_30_COPD                              : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
##  $ Score_MORT_30_HF                                : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
##  $ Score_MORT_30_PN                                : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
##  $ Score_MORT_30_STK                               : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
##  $ Score_PSI_03                                    : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
##  $ Score_PSI_04                                    : num [1:4816] 185 183 174 NA NA ...
##  $ Score_PSI_06                                    : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
##  $ Score_PSI_08                                    : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
##  $ Score_PSI_09                                    : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
##  $ Score_PSI_10                                    : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
##  $ Score_PSI_11                                    : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
##  $ Score_PSI_12                                    : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
##  $ Score_PSI_13                                    : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
##  $ Score_PSI_14                                    : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
##  $ Score_PSI_15                                    : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
##  $ Score_PSI_90                                    : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : chr [1:4816] "$22,212" "$18,030" "$21,898" NA ...

Fixing the payment column

# Remove $ and , and convert to numeric
HipKneeCleanTest <- HipKneeCleanTest %>%
  mutate_at(vars(starts_with("Payment_")), 
            ~ as.numeric(gsub("[\\$,]", "", .)))

# Checking the structure
str(HipKneeCleanTest)
## tibble [4,816 × 67] (S3: tbl_df/tbl/data.frame)
##  $ FacilityId                                      : chr [1:4816] "010001" "010005" "010006" "010007" ...
##  $ ExcessReadmissionRatio_HIP-KNEE                 : num [1:4816] 0.892 0.798 1.247 0.992 NA ...
##  $ PredictedReadmissionRate_HIP-KNEE               : num [1:4816] 3.53 3.76 5.52 4.34 NA ...
##  $ ExpectedReadmissionRate_HIP-KNEE                : num [1:4816] 3.96 4.72 4.43 4.37 NA ...
##  $ NumberOfReadmissions_HIP-KNEE                   : num [1:4816] 3 1 7 5 NA 9 2 10 NA 1 ...
##  $ PatientSurveyStarRating_H_COMP_1_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_2_STAR_RATING    : num [1:4816] 4 4 3 5 NA 3 4 4 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_3_STAR_RATING    : num [1:4816] 3 2 2 4 NA 4 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_5_STAR_RATING    : num [1:4816] 3 3 2 3 NA 3 3 2 NA 4 ...
##  $ PatientSurveyStarRating_H_COMP_6_STAR_RATING    : num [1:4816] 4 3 3 4 NA 3 3 2 NA 3 ...
##  $ PatientSurveyStarRating_H_COMP_7_STAR_RATING    : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ PatientSurveyStarRating_H_CLEAN_STAR_RATING     : num [1:4816] 3 2 1 2 NA 2 2 1 NA 4 ...
##  $ PatientSurveyStarRating_H_QUIET_STAR_RATING     : num [1:4816] 4 4 4 4 NA 4 4 3 NA 5 ...
##  $ PatientSurveyStarRating_H_HSP_RATING_STAR_RATING: num [1:4816] 4 3 2 4 NA 3 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_RECMND_STAR_RATING    : num [1:4816] 4 3 2 4 NA 4 2 3 NA 4 ...
##  $ PatientSurveyStarRating_H_STAR_RATING           : num [1:4816] 4 3 2 4 NA 3 3 3 NA 4 ...
##  $ HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE     : num [1:4816] 89 90 88 91 NA 90 91 89 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE     : num [1:4816] 91 92 89 95 NA 90 91 91 NA 92 ...
##  $ HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE     : num [1:4816] 81 75 75 88 NA 85 80 78 NA 85 ...
##  $ HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE     : num [1:4816] 77 76 71 77 NA 76 76 72 NA 78 ...
##  $ HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE     : num [1:4816] 87 86 83 87 NA 86 86 81 NA 86 ...
##  $ HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE     : num [1:4816] 82 79 77 82 NA 81 79 80 NA 83 ...
##  $ HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE      : num [1:4816] 84 80 74 80 NA 81 83 78 NA 88 ...
##  $ HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE      : num [1:4816] 86 85 85 87 NA 84 84 82 NA 89 ...
##  $ HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE : num [1:4816] 89 85 82 89 NA 88 83 85 NA 90 ...
##  $ HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE     : num [1:4816] 90 83 79 88 NA 87 80 84 NA 91 ...
##  $ EDV                                             : chr [1:4816] "high" "high" "high" "low" ...
##  $ ED_2_Strata_1                                   : num [1:4816] NA 148 NA NA NA NA NA NA NA NA ...
##  $ HCP_COVID_19                                    : num [1:4816] 80.7 79.8 79 57.9 81.2 88 69.8 87.3 95.9 85.3 ...
##  $ IMM_3                                           : num [1:4816] 95 80 67 53 45 81 65 93 98 81 ...
##  $ OP_18b                                          : num [1:4816] 215 147 177 130 118 206 160 185 102 145 ...
##  $ OP_18c                                          : num [1:4816] 317 266 NA 216 98 124 220 220 NA 324 ...
##  $ OP_22                                           : num [1:4816] 5 3 1 4 0 5 4 3 0 2 ...
##  $ OP_23                                           : num [1:4816] NA NA 69 NA NA 47 NA 73 NA 35 ...
##  $ OP_29                                           : num [1:4816] 47 96 85 23 67 100 100 NA NA 82 ...
##  $ SAFE_USE_OF_OPIOIDS                             : num [1:4816] 14 19 17 NA NA 20 14 23 NA 17 ...
##  $ SEP_1                                           : num [1:4816] 66 74 56 86 NA 51 92 77 NA 87 ...
##  $ SEP_SH_3HR                                      : num [1:4816] 70 88 77 NA NA 78 94 83 NA 90 ...
##  $ SEP_SH_6HR                                      : num [1:4816] 100 91 81 NA NA 81 83 100 NA 94 ...
##  $ SEV_SEP_3HR                                     : num [1:4816] 79 88 78 89 NA 69 95 85 NA 94 ...
##  $ SEV_SEP_6HR                                     : num [1:4816] 95 96 86 97 NA 91 99 97 NA 99 ...
##  $ STK_02                                          : num [1:4816] 98 100 96 NA NA 93 NA 99 NA NA ...
##  $ STK_05                                          : num [1:4816] NA 91 NA NA NA NA NA NA NA NA ...
##  $ STK_06                                          : num [1:4816] NA NA 97 NA NA NA NA NA NA NA ...
##  $ VTE_1                                           : num [1:4816] 98 NA NA NA NA 79 89 84 44 59 ...
##  $ VTE_2                                           : num [1:4816] 99 NA 97 NA NA 88 93 94 NA NA ...
##  $ Score_COMP_HIP_KNEE                             : num [1:4816] 2.7 2.3 4.6 NA NA 3.5 3.8 3.5 NA 4.3 ...
##  $ Score_MORT_30_AMI                               : num [1:4816] 12 13.6 16.5 NA NA 13.2 13.8 13.1 NA NA ...
##  $ Score_MORT_30_COPD                              : num [1:4816] 8.8 9.9 9.9 13.7 NA 10.3 NA 9.2 NA 7.8 ...
##  $ Score_MORT_30_HF                                : num [1:4816] 8.9 14.9 12.5 12.5 NA 13.5 13.6 9.9 NA 16.9 ...
##  $ Score_MORT_30_PN                                : num [1:4816] 18 23.3 19.5 28.5 NA 20.9 22 17.2 NA 26.1 ...
##  $ Score_MORT_30_STK                               : num [1:4816] 14.8 15.3 17.2 NA NA 12.3 NA 13.2 NA 17.3 ...
##  $ Score_PSI_03                                    : num [1:4816] 0.39 0.94 1.39 0.42 0.54 0.13 0.41 0.63 0.57 0.47 ...
##  $ Score_PSI_04                                    : num [1:4816] 185 183 174 NA NA ...
##  $ Score_PSI_06                                    : num [1:4816] 0.23 0.22 0.36 0.24 0.25 0.24 0.24 0.21 0.25 0.22 ...
##  $ Score_PSI_08                                    : num [1:4816] 0.1 0.09 0.08 0.09 0.09 0.08 0.09 0.09 0.09 0.09 ...
##  $ Score_PSI_09                                    : num [1:4816] 2.39 2.69 5.43 2.49 NA 1.88 2.44 3.29 2.44 2.58 ...
##  $ Score_PSI_10                                    : num [1:4816] 1.14 1.37 1.26 1.57 NA 1.72 1.51 1.2 1.57 NA ...
##  $ Score_PSI_11                                    : num [1:4816] 13.83 7.19 7.37 8.45 NA ...
##  $ Score_PSI_12                                    : num [1:4816] 4.49 3.01 3.36 3.89 NA 3.04 3.32 3.67 3.56 5.63 ...
##  $ Score_PSI_13                                    : num [1:4816] 8.05 4.46 4.37 5.19 NA 5.55 4.88 6.08 5.18 NA ...
##  $ Score_PSI_14                                    : num [1:4816] 1.69 1.87 1.76 NA NA 1.86 2.46 2.77 NA 1.83 ...
##  $ Score_PSI_15                                    : num [1:4816] 0.93 0.91 1.34 1.08 NA 1.18 1.04 0.84 NA 0.88 ...
##  $ Score_PSI_90                                    : num [1:4816] 1.21 0.97 1.17 0.95 NA 0.72 0.89 1.17 0.98 1.05 ...
##  $ FacilityName                                    : chr [1:4816] "SOUTHEAST HEALTH MEDICAL CENTER" "MARSHALL MEDICAL CENTERS" "NORTH ALABAMA MEDICAL CENTER" "MIZELL MEMORIAL HOSPITAL" ...
##  $ State                                           : chr [1:4816] "AL" "AL" "AL" "AL" ...
##  $ Payment_PAYM_90_HIP_KNEE                        : num [1:4816] 22212 18030 21898 NA NA ...

Encoding categorical variables

Identify Categorical Variables

# Create function to find categorical variables
is_categorical <- function(x) is.factor(x) | is.character(x)

# Apply function to all variables in the dataset
categorical_vars <- sapply(HipKneeClean, is_categorical)

# Print the names of all categorical variables
categorical <- names(HipKneeClean)[categorical_vars]
categorical
## [1] "FacilityId"   "EDV"          "FacilityName" "State"

Dummy encode the EDV column

# Define the encoding mapping (ignore NAs for now)
encoding_map <- c(
  'low' = 1,
  'medium' = 2,
  'high' = 3,
  'very high' = 4
)
# Dummy encoding used due to ordinal nature of this data

# Create a copy of HipKneeCleanTest and name it HipKneeTest to separate cleaned dataset and the test dataset
HipKneeTest <- HipKneeCleanTest %>%
  mutate(EDV = recode(EDV, !!!encoding_map))

# Print first 20 rows of EDV column in HipKneeClean and HipKneeTrain to ensure proper encoding
cat("HipKneeCleanTest")
## HipKneeCleanTest
print(head(HipKneeCleanTest$EDV, 20))
##  [1] "high"      "high"      "high"      "low"       "low"       "high"     
##  [7] "low"       "medium"    "low"       "medium"    "low"       "low"      
## [13] "high"      "high"      "very high" "very high" "low"       "high"     
## [19] "low"       "very high"
cat("HipKneeTest")
## HipKneeTest
print(head(HipKneeTest$EDV, 20))
##  [1] 3 3 3 1 1 3 1 2 1 2 1 1 3 3 4 4 1 3 1 4

Encode each state in alphabetical order

# Manually map out each state with their respective code in alphabetical order with a preceding 0 to make data non-ordinal
state_mapping <- c(
  "AL" = "001",
  "AK" = "002",
  "AZ" = "003",
  "AR" = "004",
  "CA" = "005",
  "CO" = "006",
  "CT" = "007",
  "DE" = "008",
  "FL" = "009",
  "GA" = "010",
  "HI" = "011",
  "ID" = "012",
  "IL" = "013",
  "IN" = "014",
  "IA" = "015",
  "KS" = "016",
  "KY" = "017",
  "LA" = "018",
  "ME" = "019",
  "MD" = "020",
  "MA" = "021",
  "MI" = "022",
  "MN" = "023",
  "MS" = "024",
  "MO" = "025",
  "MT" = "026",
  "NE" = "027",
  "NV" = "028",
  "NH" = "029",
  "NJ" = "030",
  "NM" = "031",
  "NY" = "032",
  "NC" = "033",
  "ND" = "034",
  "OH" = "035",
  "OK" = "036",
  "OR" = "037",
  "PA" = "038",
  "RI" = "039",
  "SC" = "040",
  "SD" = "041",
  "TN" = "042",
  "TX" = "043",
  "UT" = "044",
  "VT" = "045",
  "VA" = "046",
  "WA" = "047",
  "WV" = "048",
  "WI" = "049",
  "WY" = "050"
)

# Create new "StateCode" column with the encoded values
HipKneeTest <- HipKneeTest %>%
  mutate(StateCode = state_mapping[State])

# Print 100 rows of the "State" and "StateCode" columns to ensure accuracy
print("State and StateCode Columns")
## [1] "State and StateCode Columns"
print(head(HipKneeTest[c("State", "StateCode")], 100))
## # A tibble: 100 × 2
##    State StateCode
##    <chr> <chr>    
##  1 AL    001      
##  2 AL    001      
##  3 AL    001      
##  4 AL    001      
##  5 AL    001      
##  6 AL    001      
##  7 AL    001      
##  8 AL    001      
##  9 AL    001      
## 10 AL    001      
## # ℹ 90 more rows
# Print all unique values in "StateCode" column to ensure accuracy
print("Unique StateCode Values")
## [1] "Unique StateCode Values"
print(unique(HipKneeTest$StateCode))
##  [1] "001" "002" "003" "004" "005" "006" "007" "008" NA    "009" "010" "011"
## [13] "012" "013" "014" "015" "016" "017" "018" "019" "020" "021" "022" "023"
## [25] "024" "025" "026" "027" "028" "029" "030" "031" "032" "033" "034" "035"
## [37] "036" "037" "038" "039" "040" "041" "042" "043" "044" "045" "046" "047"
## [49] "048" "049" "050"

Collinearity and Feature Removal

Remove correlated and unnecessary variables

# Specify columns to remove
columns_to_remove <- c(
  "ED_2_Strata_1",
  "OP_23",
  "VTE_2",
  "OP_18c",
  "OP_22",
  "STK_02",
  "STK_05",
  "STK_06",
  "HcahpsLinearMeanValue_H_RECMND_LINEAR_SCORE",
  "NumberOfReadmissions_HIP-KNEE",
  "ExcessReadmissionRatio_HIP-KNEE",
  "ExpectedReadmissionRate_HIP-KNEE",
  "SEP_1",
  "SEV_SEP_6HR",
  "SEV_SEP_3HR",
  "SEP_SH_6HR",
  "SEP_SH_3HR",
  "Score_PSI_90",
  "PatientSurveyStarRating_H_COMP_1_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_2_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_3_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_5_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_6_STAR_RATING",    
  "PatientSurveyStarRating_H_COMP_7_STAR_RATING",    
  "PatientSurveyStarRating_H_CLEAN_STAR_RATING",     
  "PatientSurveyStarRating_H_QUIET_STAR_RATING",     
  "PatientSurveyStarRating_H_HSP_RATING_STAR_RATING",
  "PatientSurveyStarRating_H_RECMND_STAR_RATING",    
  "PatientSurveyStarRating_H_STAR_RATING",
  "HcahpsLinearMeanValue_H_COMP_1_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_2_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_3_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_5_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_6_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_COMP_7_LINEAR_SCORE",    
  "HcahpsLinearMeanValue_H_CLEAN_LINEAR_SCORE",     
  "HcahpsLinearMeanValue_H_QUIET_LINEAR_SCORE"
)

# Remove specified columns
HipKneeTest <- HipKneeTest %>% select(-all_of(columns_to_remove))
# Print column names to verify
print("Remaining columns:")
## [1] "Remaining columns:"
print(colnames(HipKneeTest))
##  [1] "FacilityId"                                     
##  [2] "PredictedReadmissionRate_HIP-KNEE"              
##  [3] "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE"
##  [4] "EDV"                                            
##  [5] "HCP_COVID_19"                                   
##  [6] "IMM_3"                                          
##  [7] "OP_18b"                                         
##  [8] "OP_29"                                          
##  [9] "SAFE_USE_OF_OPIOIDS"                            
## [10] "VTE_1"                                          
## [11] "Score_COMP_HIP_KNEE"                            
## [12] "Score_MORT_30_AMI"                              
## [13] "Score_MORT_30_COPD"                             
## [14] "Score_MORT_30_HF"                               
## [15] "Score_MORT_30_PN"                               
## [16] "Score_MORT_30_STK"                              
## [17] "Score_PSI_03"                                   
## [18] "Score_PSI_04"                                   
## [19] "Score_PSI_06"                                   
## [20] "Score_PSI_08"                                   
## [21] "Score_PSI_09"                                   
## [22] "Score_PSI_10"                                   
## [23] "Score_PSI_11"                                   
## [24] "Score_PSI_12"                                   
## [25] "Score_PSI_13"                                   
## [26] "Score_PSI_14"                                   
## [27] "Score_PSI_15"                                   
## [28] "FacilityName"                                   
## [29] "State"                                          
## [30] "Payment_PAYM_90_HIP_KNEE"                       
## [31] "StateCode"

Reassess collinearity with heatmap and correlation matrix

# Compute correlation matrix
cor_matrix <- cor(HipKneeTest %>% select_if(is.numeric), use = "pairwise.complete.obs")

# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)

# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")

# Convert correlation matrix to df
cor_table <- as.data.frame(cor_matrix)

# Add variable names as a column
cor_table$Variable <- rownames(cor_table)

# Reorder columns
cor_table <- cor_table %>%
  select(Variable, everything())

# Print table
cor_table %>%
  kable(caption = "Table 8. Correlation Coefficients Table") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 8. Correlation Coefficients Table
Variable PredictedReadmissionRate_HIP-KNEE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE EDV HCP_COVID_19 IMM_3 OP_18b OP_29 SAFE_USE_OF_OPIOIDS VTE_1 Score_COMP_HIP_KNEE Score_MORT_30_AMI Score_MORT_30_COPD Score_MORT_30_HF Score_MORT_30_PN Score_MORT_30_STK Score_PSI_03 Score_PSI_04 Score_PSI_06 Score_PSI_08 Score_PSI_09 Score_PSI_10 Score_PSI_11 Score_PSI_12 Score_PSI_13 Score_PSI_14 Score_PSI_15 Payment_PAYM_90_HIP_KNEE
PredictedReadmissionRate_HIP-KNEE PredictedReadmissionRate_HIP-KNEE 1.0000000 -0.2060912 0.1986939 -0.0563082 -0.0028840 0.1295727 -0.0106510 0.1063002 0.0654668 0.3208550 0.0074065 -0.0794948 -0.1067828 -0.0985660 -0.0376746 -0.0037334 -0.0449077 0.0154891 -0.0214412 -0.0182303 0.0710046 0.1130121 0.1047402 0.1193336 0.0140012 -0.0158282 0.2975679
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE -0.2060912 1.0000000 -0.2262341 0.0302154 0.2182505 -0.2448842 0.0865028 0.1100002 -0.0248375 -0.1091775 -0.0952111 -0.0230632 0.0300940 -0.0702915 -0.0760295 -0.0499063 -0.0948401 -0.0054742 -0.0590998 0.0707774 -0.0157615 -0.1615833 -0.0674588 -0.1458284 -0.0172231 0.0304634 -0.2108956
EDV EDV 0.1986939 -0.2262341 1.0000000 0.1599806 0.0038674 0.5918897 0.0603877 -0.1223889 0.2992859 -0.0240093 -0.0687401 -0.0840621 -0.2588739 -0.1904351 -0.0754281 0.0292819 -0.0438108 -0.0308989 -0.1637017 -0.0181523 0.0837980 0.0455190 0.0749218 0.0724936 0.0399935 0.0144400 0.0053946
HCP_COVID_19 HCP_COVID_19 -0.0563082 0.0302154 0.1599806 1.0000000 0.3203622 0.2574291 0.1067941 -0.0812735 0.0241622 -0.0510683 -0.0869890 -0.1128278 -0.1245435 -0.1523779 -0.0988833 0.0943953 0.0417007 0.0225916 -0.0272232 0.0549990 0.0009222 -0.0909811 0.1091949 -0.0160408 0.0169512 0.0430812 -0.0627505
IMM_3 IMM_3 -0.0028840 0.2182505 0.0038674 0.3203622 1.0000000 0.1105628 0.1317922 0.0410289 0.0906329 -0.0212916 -0.0165321 -0.0616051 -0.0010634 -0.0761104 0.0146397 0.0451508 0.0601831 0.0544576 -0.0311579 0.0899361 0.0412713 -0.0625676 0.0594933 -0.0250714 0.0723872 0.0625753 -0.0720431
OP_18b OP_18b 0.1295727 -0.2448842 0.5918897 0.2574291 0.1105628 1.0000000 0.0506067 -0.1400845 0.2344307 -0.0293698 -0.0678837 -0.1527290 -0.2195933 -0.1858291 -0.0905644 0.0583644 0.0638412 0.0187794 -0.0806516 0.0224833 0.0544528 -0.0076797 0.1653812 0.0993570 0.0676972 0.0374530 -0.0241965
OP_29 OP_29 -0.0106510 0.0865028 0.0603877 0.1067941 0.1317922 0.0506067 1.0000000 -0.0650231 0.1567526 -0.0096464 -0.0600569 0.0081252 0.0099705 -0.0536184 -0.0251654 -0.0032584 0.0312780 0.0059688 -0.0199006 0.0150699 0.0331569 -0.0837910 -0.0108546 -0.0087678 0.0163779 0.0419068 -0.0815209
SAFE_USE_OF_OPIOIDS SAFE_USE_OF_OPIOIDS 0.1063002 0.1100002 -0.1223889 -0.0812735 0.0410289 -0.1400845 -0.0650231 1.0000000 -0.0563373 -0.0081923 -0.0643353 -0.0362573 0.0171605 -0.0204107 -0.0804707 -0.0287158 -0.0995591 -0.0603629 -0.0017558 -0.0043396 -0.0382369 0.0137283 -0.0380145 -0.0258097 -0.0166146 -0.0268805 -0.0048449
VTE_1 VTE_1 0.0654668 -0.0248375 0.2992859 0.0241622 0.0906329 0.2344307 0.1567526 -0.0563373 1.0000000 -0.0526925 -0.0493931 -0.0282911 -0.1051171 -0.1710235 -0.1238916 -0.0378500 -0.1180160 -0.0262166 -0.0522876 -0.0577213 -0.0037803 -0.0301775 -0.0317534 -0.0440578 0.0086141 0.0326658 -0.1256363
Score_COMP_HIP_KNEE Score_COMP_HIP_KNEE 0.3208550 -0.1091775 -0.0240093 -0.0510683 -0.0212916 -0.0293698 -0.0096464 -0.0081923 -0.0526925 1.0000000 0.0830479 -0.0203930 -0.0007242 0.0241066 0.0211621 0.0498557 0.0038509 0.0505415 0.0577776 0.0540124 0.0813038 0.1279724 0.1458258 0.1334619 0.0498603 0.0433809 0.3410864
Score_MORT_30_AMI Score_MORT_30_AMI 0.0074065 -0.0952111 -0.0687401 -0.0869890 -0.0165321 -0.0678837 -0.0600569 -0.0643353 -0.0493931 0.0830479 1.0000000 0.2498600 0.3407616 0.3309425 0.2222539 0.0415523 0.2105379 0.0885083 0.1010348 0.0889343 0.1066619 0.1037006 0.0492328 0.0467554 0.0454462 0.0297688 0.0591548
Score_MORT_30_COPD Score_MORT_30_COPD -0.0794948 -0.0230632 -0.0840621 -0.1128278 -0.0616051 -0.1527290 0.0081252 -0.0362573 -0.0282911 -0.0203930 0.2498600 1.0000000 0.3844105 0.3710744 0.2038243 -0.0069743 0.1713379 0.0478268 0.0397571 0.0429090 0.0320669 0.0426574 -0.0532586 0.0026944 0.0734846 0.0340007 -0.0406696
Score_MORT_30_HF Score_MORT_30_HF -0.1067828 0.0300940 -0.2588739 -0.1245435 -0.0010634 -0.2195933 0.0099705 0.0171605 -0.1051171 -0.0007242 0.3407616 0.3844105 1.0000000 0.4479367 0.3147371 0.0371596 0.2556384 0.0679149 0.1051698 0.0707269 0.0383771 0.0362529 -0.0300702 -0.0086832 0.0647245 0.0342374 -0.0350247
Score_MORT_30_PN Score_MORT_30_PN -0.0985660 -0.0702915 -0.1904351 -0.1523779 -0.0761104 -0.1858291 -0.0536184 -0.0204107 -0.1710235 0.0241066 0.3309425 0.3710744 0.4479367 1.0000000 0.3042563 0.0303815 0.2301195 0.0543554 0.0884315 0.0217880 0.0237048 0.0704445 0.0089560 0.0393676 0.0464407 0.0029691 -0.0062985
Score_MORT_30_STK Score_MORT_30_STK -0.0376746 -0.0760295 -0.0754281 -0.0988833 0.0146397 -0.0905644 -0.0251654 -0.0804707 -0.1238916 0.0211621 0.2222539 0.2038243 0.3147371 0.3042563 1.0000000 0.0687216 0.2380935 0.0878847 0.1014879 0.0674377 0.0622532 0.0725381 0.0474896 0.0513975 0.0492194 0.0625191 -0.0272101
Score_PSI_03 Score_PSI_03 -0.0037334 -0.0499063 0.0292819 0.0943953 0.0451508 0.0583644 -0.0032584 -0.0287158 -0.0378500 0.0498557 0.0415523 -0.0069743 0.0371596 0.0303815 0.0687216 1.0000000 0.1353085 0.0601750 0.0636661 0.1407342 0.0386211 0.0114365 0.1186788 0.0298580 0.0596798 0.0999683 0.0086745
Score_PSI_04 Score_PSI_04 -0.0449077 -0.0948401 -0.0438108 0.0417007 0.0601831 0.0638412 0.0312780 -0.0995591 -0.1180160 0.0038509 0.2105379 0.1713379 0.2556384 0.2301195 0.2380935 0.1353085 1.0000000 0.0601419 0.0870693 0.1059485 0.0523892 0.0649032 0.0782559 0.0123489 0.0652098 0.1018205 -0.0766302
Score_PSI_06 Score_PSI_06 0.0154891 -0.0054742 -0.0308989 0.0225916 0.0544576 0.0187794 0.0059688 -0.0603629 -0.0262166 0.0505415 0.0885083 0.0478268 0.0679149 0.0543554 0.0878847 0.0601750 0.0601419 1.0000000 0.0724291 0.1014588 0.0516246 0.0351464 0.1431056 0.0509831 0.0527115 0.0910520 0.0456525
Score_PSI_08 Score_PSI_08 -0.0214412 -0.0590998 -0.1637017 -0.0272232 -0.0311579 -0.0806516 -0.0199006 -0.0017558 -0.0522876 0.0577776 0.1010348 0.0397571 0.1051698 0.0884315 0.1014879 0.0636661 0.0870693 0.0724291 1.0000000 0.0052449 -0.0360093 0.0198090 0.0394605 0.0093444 0.0228045 0.0127268 -0.0041983
Score_PSI_09 Score_PSI_09 -0.0182303 0.0707774 -0.0181523 0.0549990 0.0899361 0.0224833 0.0150699 -0.0043396 -0.0577213 0.0540124 0.0889343 0.0429090 0.0707269 0.0217880 0.0674377 0.1407342 0.1059485 0.1014588 0.0052449 1.0000000 0.0885278 0.0680540 0.1732337 0.0519119 0.1207438 0.2197254 -0.0237660
Score_PSI_10 Score_PSI_10 0.0710046 -0.0157615 0.0837980 0.0009222 0.0412713 0.0544528 0.0331569 -0.0382369 -0.0037803 0.0813038 0.1066619 0.0320669 0.0383771 0.0237048 0.0622532 0.0386211 0.0523892 0.0516246 -0.0360093 0.0885278 1.0000000 0.1626632 0.1079488 0.2303938 0.0453739 0.0830134 0.0497447
Score_PSI_11 Score_PSI_11 0.1130121 -0.1615833 0.0455190 -0.0909811 -0.0625676 -0.0076797 -0.0837910 0.0137283 -0.0301775 0.1279724 0.1037006 0.0426574 0.0362529 0.0704445 0.0725381 0.0114365 0.0649032 0.0351464 0.0198090 0.0680540 0.1626632 1.0000000 0.1172504 0.2506376 -0.0093577 0.0464067 0.1441986
Score_PSI_12 Score_PSI_12 0.1047402 -0.0674588 0.0749218 0.1091949 0.0594933 0.1653812 -0.0108546 -0.0380145 -0.0317534 0.1458258 0.0492328 -0.0532586 -0.0300702 0.0089560 0.0474896 0.1186788 0.0782559 0.1431056 0.0394605 0.1732337 0.1079488 0.1172504 1.0000000 0.1742084 0.0522204 0.1358951 0.0655557
Score_PSI_13 Score_PSI_13 0.1193336 -0.1458284 0.0724936 -0.0160408 -0.0250714 0.0993570 -0.0087678 -0.0258097 -0.0440578 0.1334619 0.0467554 0.0026944 -0.0086832 0.0393676 0.0513975 0.0298580 0.0123489 0.0509831 0.0093444 0.0519119 0.2303938 0.2506376 0.1742084 1.0000000 0.0056987 0.0878105 0.0949467
Score_PSI_14 Score_PSI_14 0.0140012 -0.0172231 0.0399935 0.0169512 0.0723872 0.0676972 0.0163779 -0.0166146 0.0086141 0.0498603 0.0454462 0.0734846 0.0647245 0.0464407 0.0492194 0.0596798 0.0652098 0.0527115 0.0228045 0.1207438 0.0453739 -0.0093577 0.0522204 0.0056987 1.0000000 0.1176726 -0.0181150
Score_PSI_15 Score_PSI_15 -0.0158282 0.0304634 0.0144400 0.0430812 0.0625753 0.0374530 0.0419068 -0.0268805 0.0326658 0.0433809 0.0297688 0.0340007 0.0342374 0.0029691 0.0625191 0.0999683 0.1018205 0.0910520 0.0127268 0.2197254 0.0830134 0.0464067 0.1358951 0.0878105 0.1176726 1.0000000 -0.0467071
Payment_PAYM_90_HIP_KNEE Payment_PAYM_90_HIP_KNEE 0.2975679 -0.2108956 0.0053946 -0.0627505 -0.0720431 -0.0241965 -0.0815209 -0.0048449 -0.1256363 0.3410864 0.0591548 -0.0406696 -0.0350247 -0.0062985 -0.0272101 0.0086745 -0.0766302 0.0456525 -0.0041983 -0.0237660 0.0497447 0.1441986 0.0655557 0.0949467 -0.0181150 -0.0467071 1.0000000

Imputation and Handling of Missing Values

# Change - to _ in HIP-KNEE
colnames(HipKneeTest) <- gsub("-", "_", colnames(HipKneeTest))

# Remove all NA values in target variable "PredictedReadmissionRate_HIP_KNEE"
HipKneeTest <- HipKneeTest %>% filter(!is.na(PredictedReadmissionRate_HIP_KNEE))

# Remove all NA values in the "State", "StateCode", and "FacilityName" columns
HipKneeTest <- HipKneeTest %>% drop_na(State, StateCode, FacilityName)


# Print number of remaining variables and observations
dimensions <- dim(HipKneeTest)
cat("Number of variables:", dimensions[2], "\n")
## Number of variables: 31
cat("Number of observations:", dimensions[1], "\n")
## Number of observations: 1833
# Calculate missing values
missing_values_summary <- HipKneeTest %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTest)) * 100)

# Print table
missing_values_summary %>%
  kable(caption = "Table 7. Missing Values Summary") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 7. Missing Values Summary
Variable Missing_Count Missing_Percentage
FacilityId 0 0.0000000
PredictedReadmissionRate_HIP_KNEE 0 0.0000000
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 33 1.8003273
EDV 90 4.9099836
HCP_COVID_19 16 0.8728860
IMM_3 16 0.8728860
OP_18b 75 4.0916530
OP_29 222 12.1112930
SAFE_USE_OF_OPIOIDS 69 3.7643208
VTE_1 994 54.2280415
Score_COMP_HIP_KNEE 40 2.1822149
Score_MORT_30_AMI 405 22.0949264
Score_MORT_30_COPD 247 13.4751773
Score_MORT_30_HF 141 7.6923077
Score_MORT_30_PN 125 6.8194217
Score_MORT_30_STK 284 15.4937261
Score_PSI_03 8 0.4364430
Score_PSI_04 575 31.3693399
Score_PSI_06 2 0.1091107
Score_PSI_08 2 0.1091107
Score_PSI_09 2 0.1091107
Score_PSI_10 41 2.2367703
Score_PSI_11 40 2.1822149
Score_PSI_12 2 0.1091107
Score_PSI_13 42 2.2913257
Score_PSI_14 87 4.7463175
Score_PSI_15 29 1.5821058
FacilityName 0 0.0000000
State 0 0.0000000
Payment_PAYM_90_HIP_KNEE 42 2.2913257
StateCode 0 0.0000000

Impute variables with low percentage missingness (<5%) by the median for numeric variables and mode for categorical variables

# Calculate median for columns with <5% missing values
numeric_vars_low_missing <- c("HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "EDV", "HCP_COVID_19", "IMM_3", "OP_18b", "SAFE_USE_OF_OPIOIDS", "Score_COMP_HIP_KNEE", "Score_PSI_03", "Score_PSI_06", "Score_PSI_08", "Score_PSI_09", "Score_PSI_10", "Score_PSI_11", "Score_PSI_12", "Score_PSI_13", "Score_PSI_14", "Score_PSI_15", "Payment_PAYM_90_HIP_KNEE")

for (var in numeric_vars_low_missing) {
  HipKneeTest[[var]][is.na(HipKneeTest[[var]])] <- median(HipKneeTest[[var]], na.rm = TRUE)
}

Impute high percentage missingness variables (>5%) using KNN

# Select high missingness variables for KNN imputation
vars_for_knn <- c("VTE_1", "Score_MORT_30_AMI", "Score_MORT_30_COPD", "Score_MORT_30_HF", "Score_MORT_30_PN", "Score_MORT_30_STK", "Score_PSI_04", "OP_29")

# Perform KNN imputation
HipKneeTest_knn <- kNN(HipKneeTest, variable = vars_for_knn, k = 5)

# Remove columns created by the KNN function
HipKneeTest_knn <- HipKneeTest_knn %>% select(-ends_with("_imp"))

# Update HipKneeTrain with imputed values
HipKneeTest[vars_for_knn] <- HipKneeTest_knn[vars_for_knn]
# Calculate missing values
missing_values_summary <- HipKneeTest %>%
  summarise(across(everything(), ~ sum(is.na(.)))) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Missing_Count") %>%
  mutate(Missing_Percentage = (Missing_Count / nrow(HipKneeTest)) * 100)

# Print table
missing_values_summary %>%
  kable(caption = "Table 7. Missing Values Summary") %>%
  kable_styling(bootstrap_options = c("hover", "striped", "responsive"))
Table 7. Missing Values Summary
Variable Missing_Count Missing_Percentage
FacilityId 0 0
PredictedReadmissionRate_HIP_KNEE 0 0
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 0 0
EDV 0 0
HCP_COVID_19 0 0
IMM_3 0 0
OP_18b 0 0
OP_29 0 0
SAFE_USE_OF_OPIOIDS 0 0
VTE_1 0 0
Score_COMP_HIP_KNEE 0 0
Score_MORT_30_AMI 0 0
Score_MORT_30_COPD 0 0
Score_MORT_30_HF 0 0
Score_MORT_30_PN 0 0
Score_MORT_30_STK 0 0
Score_PSI_03 0 0
Score_PSI_04 0 0
Score_PSI_06 0 0
Score_PSI_08 0 0
Score_PSI_09 0 0
Score_PSI_10 0 0
Score_PSI_11 0 0
Score_PSI_12 0 0
Score_PSI_13 0 0
Score_PSI_14 0 0
Score_PSI_15 0 0
FacilityName 0 0
State 0 0
Payment_PAYM_90_HIP_KNEE 0 0
StateCode 0 0

Feature Engineer Mortality Data

# Average death rates amongst mortality variables and create new column "Score_Ovr_MORT"
HipKneeTest$Score_Ovr_MORT <- rowMeans(HipKneeTest[, c("Score_MORT_30_AMI", 
                                                         "Score_MORT_30_COPD", 
                                                         "Score_MORT_30_HF", 
                                                         "Score_MORT_30_PN", 
                                                         "Score_MORT_30_STK")], 
                                                          na.rm = TRUE)

# Remove old mortality columns
HipKneeTest <- HipKneeTest[, !(names(HipKneeTest) %in% c("Score_MORT_30_AMI", 
                                                            "Score_MORT_30_COPD",
                                                            "Score_MORT_30_HF", 
                                                            "Score_MORT_30_PN", 
                                                            "Score_MORT_30_STK"))]

Reassess heatmap with engineered mortality data

# Compute correlation matrix
cor_matrix <- cor(HipKneeTest %>% select_if(is.numeric), use = "pairwise.complete.obs")

# Melt the correlation matrix into a long format
cor_melted <- melt(cor_matrix)

# Plot heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), name = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Figure 5. Correlation Heatmap of Numeric Variables")

Save the data for future ease of use

save(HipKneeTest, file = "HipKneeTest.RData")

Descriptive Statistics (AC)

# Create a summary table of descriptive statistics
descr_stats <- describe(HipKneeTrain)
# Remove the rows with Facility ID, State and State code, and facility name
descr_stats <- descr_stats %>% filter(vars != c(1, 23, 24, 26))
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `vars != c(1, 23, 24, 26)`.
## Caused by warning in `vars != c(1, 23, 24, 26)`:
## ! longer object length is not a multiple of shorter object length
# Remove columns 1, 2, 5, and 6
descr_stats <- descr_stats[, -c(1, 2, 5, 6)]

# Create a table with kable
kable(descr_stats, format = "html", caption = "Descriptive Statistics for All Numeric Variables in Final Dataset") %>%
  kable_styling(
    bootstrap_options = c("hover", "striped", "responsive")
  ) %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(2, width = "5em") %>%
  row_spec(0, bold = TRUE, background = "#f2f2f2")
Descriptive Statistics for All Numeric Variables in Final Dataset
mean sd mad min max range skew kurtosis se
PredictedReadmissionRate_HIP_KNEE 4.546561e+00 0.9093914 0.8576841 1.9279 8.569 6.6411 0.4373061 0.4908058 0.0212407
HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 8.672450e+01 3.7230744 2.9652000 66.0000 98.000 32.0000 -0.5851721 1.5050011 0.0869602
EDV 2.694490e+00 1.0250451 1.4826000 1.0000 4.000 3.0000 -0.1385316 -1.1578643 0.0239421
HCP_COVID_19 8.887098e+01 9.5606361 8.3025600 32.6000 100.000 67.4000 -1.3148000 2.1219028 0.2233087
IMM_3 7.932570e+01 17.7478481 16.3086000 4.0000 100.000 96.0000 -1.0958217 0.7241943 0.4145381
OP_18b 1.895739e+02 49.6783664 44.4780000 62.0000 587.000 525.0000 0.9817063 3.3080181 1.1603422
OP_29 9.215876e+01 11.7440127 4.4478000 0.0000 100.000 100.0000 -3.3057142 14.4641144 0.2743060
SAFE_USE_OF_OPIOIDS 1.557229e+01 4.2502679 2.9652000 0.0000 45.000 45.0000 0.5890847 3.1474800 0.0992739
VTE_1 9.107365e+01 9.1561257 4.4478000 5.0000 100.000 95.0000 -4.3773341 29.2741519 0.2138605
Score_COMP_HIP_KNEE 3.179924e+00 0.5539759 0.4447800 1.6000 6.200 4.6000 0.7280978 1.7214783 0.0129393
Score_PSI_03 5.878123e-01 0.5183737 0.2816940 0.0500 6.310 6.2600 3.5027779 21.8351982 0.0121077
Score_PSI_04 1.688142e+02 19.0239898 15.6117780 86.6800 241.810 155.1300 -0.0656641 1.2542628 0.4443451
Score_PSI_06 2.467703e-01 0.0452557 0.0296520 0.1200 0.510 0.3900 0.9085914 1.9551905 0.0010570
Score_PSI_08 8.994540e-02 0.0082004 0.0000000 0.0600 0.130 0.0700 0.5502254 1.7580966 0.0001915
Score_PSI_09 2.514146e+00 0.4953103 0.3409980 1.1000 6.100 5.0000 1.2648156 4.8211377 0.0115690
Score_PSI_10 1.572951e+00 0.3731798 0.1482600 0.4700 4.550 4.0800 1.8287437 7.4056616 0.0087164
Score_PSI_11 8.969864e+00 3.1123203 2.2239000 2.7300 49.000 46.2700 2.7479837 22.3393252 0.0726947
Score_PSI_12 3.582755e+00 0.7944605 0.6671700 1.6100 7.510 5.9000 0.9512312 1.6446910 0.0185563
Score_PSI_13 5.287103e+00 1.0312666 0.7413000 2.1700 10.790 8.6200 1.0438233 2.7498825 0.0240874
Score_PSI_14 2.010540e+00 0.3569335 0.1927380 1.0700 4.400 3.3300 1.9129489 6.4857308 0.0083369
Score_PSI_15 1.102668e+00 0.3271882 0.2223900 0.3500 3.430 3.0800 1.7127063 5.4101230 0.0076422
FacilityName* 9.026263e+02 517.8533090 667.1700000 1.0000 1796.000 1795.0000 -0.0093756 -1.2029068 12.0955473
State* 2.469449e+01 14.4448000 20.7564000 1.0000 50.000 49.0000 0.0222112 -1.3657016 0.3373885
Payment_PAYM_90_HIP_KNEE 2.105666e+04 1943.7230953 1716.8508000 15936.0000 34916.000 18980.0000 0.7645293 1.9173982 45.3997188
StateCode* 2.472995e+01 14.5368326 20.7564000 1.0000 50.000 49.0000 0.0160810 -1.3497989 0.3395381
Score_Ovr_MORT 1.305795e+01 1.1615024 1.0378200 8.1600 17.420 9.2600 -0.0667113 0.5511229 0.0271293
# Select numeric columns
numeric_columns <- HipKneeTrain %>% select_if(is.numeric)

# Melt the data for easier plotting with ggplot2
numeric_melted <- melt(numeric_columns)
## No id variables; using all as measure variables
# Create histograms
ggplot(numeric_melted, aes(x = value)) +
  geom_histogram(bins = 30, fill = "blue", color = "black") +
  facet_wrap(~variable, scales = "free_x") +
  theme_minimal() +
  labs(title = "Histograms of Numeric Variables", x = "Value", y = "Frequency")

Segmentation Analysis

k-means clustering (SE)

# Select numeric columns for clustering
numeric_columns <- HipKneeTrain %>% select_if(is.numeric)

# Standardize features
X_scaled <- scale(numeric_columns)

# Determine optimal number of clusters using elbow plot
set.seed(123)
elbow_plot <- fviz_nbclust(X_scaled, kmeans, method = "wss", k.max = 10) +
  labs(title = "Elbow Plot for Optimal k")

print(elbow_plot)

# Optimal K = 3
optimal_k <- 3
kmeans_result <- kmeans(X_scaled, centers = optimal_k, nstart = 25)

# Create a new df for K-Means Clustering results
HipKneeTrain_K_Means <- HipKneeTrain %>%
  mutate(Cluster = as.factor(kmeans_result$cluster))

# Visualize clusters
fviz_cluster(kmeans_result, data = X_scaled,
             ellipse.type = "convex",
             palette = "jco",
             ggtheme = theme_minimal())

# Cluster characteristics
cluster_summary <- HipKneeTrain_K_Means %>%
  group_by(Cluster) %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)

print(cluster_summary)
## # A tibble: 3 × 24
##   Cluster PredictedReadmission…¹ HcahpsLinearMeanValu…²   EDV HCP_COVID_19 IMM_3
##   <fct>                    <dbl>                  <dbl> <dbl>        <dbl> <dbl>
## 1 1                         4.73                   85.0  2.51         80.6  63.3
## 2 2                         4.19                   88.2  2.53         91.8  85.8
## 3 3                         4.98                   85.9  3.19         92.4  84.7
## # ℹ abbreviated names: ¹​PredictedReadmissionRate_HIP_KNEE,
## #   ²​HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## # ℹ 18 more variables: OP_18b <dbl>, OP_29 <dbl>, SAFE_USE_OF_OPIOIDS <dbl>,
## #   VTE_1 <dbl>, Score_COMP_HIP_KNEE <dbl>, Score_PSI_03 <dbl>,
## #   Score_PSI_04 <dbl>, Score_PSI_06 <dbl>, Score_PSI_08 <dbl>,
## #   Score_PSI_09 <dbl>, Score_PSI_10 <dbl>, Score_PSI_11 <dbl>,
## #   Score_PSI_12 <dbl>, Score_PSI_13 <dbl>, Score_PSI_14 <dbl>, …
# Visualize feature distributions across clusters
features_to_plot <- c("PredictedReadmissionRate_HIP_KNEE", "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "Score_COMP_HIP_KNEE", "SAFE_USE_OF_OPIOIDS")

for (feature in features_to_plot) {
  p <- ggplot(HipKneeTrain_K_Means, aes(x = Cluster, y = .data[[feature]], fill = Cluster)) +
    geom_boxplot() +
    theme_minimal() +
    labs(title = paste("Distribution of", feature, "across clusters"))
  print(p)
}

Hierarchical Clustering (SE)

# Perform PCA
pca_result <- prcomp(X_scaled, center = TRUE, scale. = TRUE)

# Visualize variance
fviz_eig(pca_result, addlabels = TRUE)

# Factor map
fviz_pca_var(pca_result, col.var = "contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE)

# PC scores, three components
pca_scores <- as.data.frame(pca_result$x[, 1:3])

# Store PCA results in a new dataframe
HipKneeTrain_PCA <- HipKneeTrain %>%
  select_if(is.numeric) %>%
  bind_cols(pca_scores)

# Hierarchical Clustering

# Compute distance matrix
dist_matrix <- dist(X_scaled, method = "euclidean")

# Perform hierarchical clustering
hc_result <- hclust(dist_matrix, method = "ward.D2")

# Compute WCSS for different number of clusters
wcss <- sapply(1:10, function(k) {
  clusters <- cutree(hc_result, k)
  cluster_data <- scale(X_scaled)
  tot.withinss <- sum(sapply(unique(clusters), function(c) {
    sum(dist(cluster_data[clusters == c, , drop = FALSE])^2)
  }))
  return(tot.withinss)
})

# Plot WCSS
plot(1:10, wcss, type = "b", xlab = "Number of Clusters", ylab = "WCSS")

# Create clusters with optimal number of clusters from WCSS plot
k <- 3
hc_clusters <- cutree(hc_result, k = k)

# Store hierarchical clustering results in a new dataframe
HipKneeTrain_HC <- HipKneeTrain %>%
  mutate(HC_Cluster = as.factor(hc_clusters))

# Visualize clusters using first three PCs
pca_plot_data <- cbind(pca_scores[, 1:3], Cluster = hc_clusters)
fviz_cluster(list(data = pca_plot_data, cluster = hc_clusters),
             ellipse.type = "convex",
             palette = "jco",
             ggtheme = theme_minimal(),
             main = "Hierarchical Clustering Results (PCA)")

# Analyze cluster characteristics
hc_cluster_summary <- HipKneeTrain_HC %>%
  group_by(HC_Cluster) %>%
  summarise_if(is.numeric, mean, na.rm = TRUE)

print(hc_cluster_summary)
## # A tibble: 3 × 24
##   HC_Cluster PredictedReadmissionRat…¹ HcahpsLinearMeanValu…²   EDV HCP_COVID_19
##   <fct>                          <dbl>                  <dbl> <dbl>        <dbl>
## 1 1                               4.82                   85.3  2.51         85.1
## 2 2                               4.24                   88.0  2.62         91.0
## 3 3                               4.72                   86.5  3.22         91.4
## # ℹ abbreviated names: ¹​PredictedReadmissionRate_HIP_KNEE,
## #   ²​HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## # ℹ 19 more variables: IMM_3 <dbl>, OP_18b <dbl>, OP_29 <dbl>,
## #   SAFE_USE_OF_OPIOIDS <dbl>, VTE_1 <dbl>, Score_COMP_HIP_KNEE <dbl>,
## #   Score_PSI_03 <dbl>, Score_PSI_04 <dbl>, Score_PSI_06 <dbl>,
## #   Score_PSI_08 <dbl>, Score_PSI_09 <dbl>, Score_PSI_10 <dbl>,
## #   Score_PSI_11 <dbl>, Score_PSI_12 <dbl>, Score_PSI_13 <dbl>, …
# Visualize feature distributions across clusters
features_to_plot <- c("PredictedReadmissionRate_HIP_KNEE", "HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE", "Score_COMP_HIP_KNEE", "SAFE_USE_OF_OPIOIDS")

for (feature in features_to_plot) {
  p <- ggplot(HipKneeTrain_HC, aes_string(x = "HC_Cluster", y = feature, fill = "HC_Cluster")) +
    geom_boxplot() +
    theme_minimal() +
    labs(title = paste("Distribution of", feature, "across Hierarchical Clusters"))
  print(p)
}
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

> This is a preliminary segmentation analysis and we would go forth and tighten/tidy this up some more. However, just initial impression, I’m not sure clustering is entirely beneficial with our dataset. What are your thoughts on our preliminary segmentation analysis? Any ideas in which we could improve our clustering to perhaps be more meaningful?

Supervised Modeling with Quantitative Variable

Random Forest (SE)

Creating the model

# Remove unwanted columns from the dataset
HipKneeTrain_RF <- HipKneeTrain %>%
  select(-State, -FacilityName, -FacilityId)

# Define mtry parameter grid
grid <- expand.grid(
  mtry = c(2, 4, 6, 8)
)

# Define CV
train_control <- trainControl(
  method = "cv",         
  number = 7,           
  verboseIter = TRUE     
)

# Train the Random Forest model with grid search
rf_grid_search <- train(
  PredictedReadmissionRate_HIP_KNEE ~ .,   
  data = HipKneeTrain_RF,                    
  method = "rf",                          
  trControl = train_control,              
  tuneGrid = grid,                       
  importance = TRUE,                     
  ntree = 100                         
)
## + Fold1: mtry=2 
## - Fold1: mtry=2 
## + Fold1: mtry=4 
## - Fold1: mtry=4 
## + Fold1: mtry=6 
## - Fold1: mtry=6 
## + Fold1: mtry=8 
## - Fold1: mtry=8 
## + Fold2: mtry=2 
## - Fold2: mtry=2 
## + Fold2: mtry=4 
## - Fold2: mtry=4 
## + Fold2: mtry=6 
## - Fold2: mtry=6 
## + Fold2: mtry=8 
## - Fold2: mtry=8 
## + Fold3: mtry=2 
## - Fold3: mtry=2 
## + Fold3: mtry=4 
## - Fold3: mtry=4 
## + Fold3: mtry=6 
## - Fold3: mtry=6 
## + Fold3: mtry=8 
## - Fold3: mtry=8 
## + Fold4: mtry=2 
## - Fold4: mtry=2 
## + Fold4: mtry=4 
## - Fold4: mtry=4 
## + Fold4: mtry=6 
## - Fold4: mtry=6 
## + Fold4: mtry=8 
## - Fold4: mtry=8 
## + Fold5: mtry=2 
## - Fold5: mtry=2 
## + Fold5: mtry=4 
## - Fold5: mtry=4 
## + Fold5: mtry=6 
## - Fold5: mtry=6 
## + Fold5: mtry=8 
## - Fold5: mtry=8 
## + Fold6: mtry=2 
## - Fold6: mtry=2 
## + Fold6: mtry=4 
## - Fold6: mtry=4 
## + Fold6: mtry=6 
## - Fold6: mtry=6 
## + Fold6: mtry=8 
## - Fold6: mtry=8 
## + Fold7: mtry=2 
## - Fold7: mtry=2 
## + Fold7: mtry=4 
## - Fold7: mtry=4 
## + Fold7: mtry=6 
## - Fold7: mtry=6 
## + Fold7: mtry=8 
## - Fold7: mtry=8 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 8 on full training set
# Best parameters
best_params <- rf_grid_search$bestTune
print(best_params)
##   mtry
## 4    8
# Extract feature importances
best_rf_model <- rf_grid_search$finalModel
feature_importances <- importance(best_rf_model)

# Convert feature importances to a df
feature_importances_df <- as.data.frame(feature_importances)
feature_importances_df$Feature <- rownames(feature_importances_df)

# Sort importances by %IncMSE
sorted_by_inc_mse <- feature_importances_df %>%
  arrange(desc(`%IncMSE`))

# Sort importances by IncNodePurity
sorted_by_inc_node_purity <- feature_importances_df %>%
  arrange(desc(IncNodePurity))

# Print importances
cat("Feature Importances by %IncMSE:\n")
## Feature Importances by %IncMSE:
print(sorted_by_inc_mse)
##                                                     %IncMSE IncNodePurity
## Score_COMP_HIP_KNEE                             10.72099517  113.08145609
## Payment_PAYM_90_HIP_KNEE                         9.03883574  125.26854000
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE  5.27534292   73.57559485
## OP_18b                                           4.15404171   66.40490249
## StateCode009                                     3.75181579   12.68932168
## Score_PSI_14                                     3.68517112   58.90547765
## EDV                                              3.54546541   29.93993860
## Score_PSI_03                                     3.46582190   58.03081970
## Score_PSI_10                                     3.23107560   58.84939459
## Score_PSI_11                                     3.08855153   62.16132416
## StateCode022                                     3.01081021    7.67914120
## Score_Ovr_MORT                                   2.54248526   56.71609772
## Score_PSI_12                                     2.30684585   57.92670746
## Score_PSI_15                                     2.20278425   53.99298910
## StateCode017                                     2.17699288    5.53669955
## Score_PSI_13                                     2.17475455   62.17694293
## VTE_1                                            2.12580910   47.62238798
## StateCode027                                     2.06830695    2.41220726
## StateCode030                                     2.04843083    4.02894867
## StateCode028                                     2.01652439    1.00660990
## StateCode040                                     1.90785092    2.84701791
## StateCode037                                     1.88293311    0.77661060
## StateCode002                                     1.66104276    1.30433146
## SAFE_USE_OF_OPIOIDS                              1.62970944   48.55372793
## StateCode006                                     1.59630427    1.87440875
## Score_PSI_09                                     1.59564021   53.46189072
## Score_PSI_06                                     1.55882232   39.27187720
## Score_PSI_04                                     1.35238340   55.18931551
## StateCode014                                     1.32430609    5.75899367
## IMM_3                                            1.31862242   43.04444799
## StateCode044                                     1.25970392    1.55475060
## Score_PSI_08                                     1.09407851   17.99849530
## StateCode043                                     1.09398067    4.95968535
## StateCode007                                     0.91807859    1.34055175
## StateCode016                                     0.86359756    2.86664974
## OP_29                                            0.54582706   35.15294543
## StateCode045                                     0.46701246    0.26882544
## StateCode039                                     0.42256120    0.15466717
## StateCode029                                     0.35450222    0.41983563
## StateCode046                                     0.15190201    2.00772867
## StateCode021                                     0.06745647    2.12006229
## StateCode008                                     0.00000000    0.02570528
## HCP_COVID_19                                    -0.08579886   53.81821089
## StateCode035                                    -0.19417318    6.68021439
## StateCode042                                    -0.24927432    1.25474996
## StateCode034                                    -0.26070754    0.27529986
## StateCode032                                    -0.28329467    3.32535504
## StateCode018                                    -0.29430012    2.10533264
## StateCode036                                    -0.29518214    2.76191318
## StateCode013                                    -0.30227278    4.74041027
## StateCode003                                    -0.31581052    0.92501152
## StateCode038                                    -0.37765964    3.28462255
## StateCode005                                    -0.42203032    4.13871395
## StateCode047                                    -0.54564469    1.75726713
## StateCode049                                    -0.55564975    2.93855287
## StateCode026                                    -0.63405234    0.50697302
## StateCode023                                    -0.72584221    3.03005360
## StateCode019                                    -0.77874702    1.07206757
## StateCode004                                    -0.80746546    2.00122946
## StateCode025                                    -0.86678037    3.69146300
## StateCode048                                    -0.89423959    0.82992245
## StateCode012                                    -0.89584566    0.87503983
## StateCode024                                    -0.94335245    0.97347110
## StateCode033                                    -0.94710201    2.35772816
## StateCode015                                    -0.96797290    1.03562782
## StateCode050                                    -1.00503782    0.30357229
## StateCode031                                    -1.03104209    0.27458034
## StateCode041                                    -1.05841539    1.08506574
## StateCode010                                    -1.27820479    3.25993507
## StateCode011                                    -1.39915516    0.61320416
## StateCode020                                    -2.82194723    1.65521496
##                                                                                         Feature
## Score_COMP_HIP_KNEE                                                         Score_COMP_HIP_KNEE
## Payment_PAYM_90_HIP_KNEE                                               Payment_PAYM_90_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## OP_18b                                                                                   OP_18b
## StateCode009                                                                       StateCode009
## Score_PSI_14                                                                       Score_PSI_14
## EDV                                                                                         EDV
## Score_PSI_03                                                                       Score_PSI_03
## Score_PSI_10                                                                       Score_PSI_10
## Score_PSI_11                                                                       Score_PSI_11
## StateCode022                                                                       StateCode022
## Score_Ovr_MORT                                                                   Score_Ovr_MORT
## Score_PSI_12                                                                       Score_PSI_12
## Score_PSI_15                                                                       Score_PSI_15
## StateCode017                                                                       StateCode017
## Score_PSI_13                                                                       Score_PSI_13
## VTE_1                                                                                     VTE_1
## StateCode027                                                                       StateCode027
## StateCode030                                                                       StateCode030
## StateCode028                                                                       StateCode028
## StateCode040                                                                       StateCode040
## StateCode037                                                                       StateCode037
## StateCode002                                                                       StateCode002
## SAFE_USE_OF_OPIOIDS                                                         SAFE_USE_OF_OPIOIDS
## StateCode006                                                                       StateCode006
## Score_PSI_09                                                                       Score_PSI_09
## Score_PSI_06                                                                       Score_PSI_06
## Score_PSI_04                                                                       Score_PSI_04
## StateCode014                                                                       StateCode014
## IMM_3                                                                                     IMM_3
## StateCode044                                                                       StateCode044
## Score_PSI_08                                                                       Score_PSI_08
## StateCode043                                                                       StateCode043
## StateCode007                                                                       StateCode007
## StateCode016                                                                       StateCode016
## OP_29                                                                                     OP_29
## StateCode045                                                                       StateCode045
## StateCode039                                                                       StateCode039
## StateCode029                                                                       StateCode029
## StateCode046                                                                       StateCode046
## StateCode021                                                                       StateCode021
## StateCode008                                                                       StateCode008
## HCP_COVID_19                                                                       HCP_COVID_19
## StateCode035                                                                       StateCode035
## StateCode042                                                                       StateCode042
## StateCode034                                                                       StateCode034
## StateCode032                                                                       StateCode032
## StateCode018                                                                       StateCode018
## StateCode036                                                                       StateCode036
## StateCode013                                                                       StateCode013
## StateCode003                                                                       StateCode003
## StateCode038                                                                       StateCode038
## StateCode005                                                                       StateCode005
## StateCode047                                                                       StateCode047
## StateCode049                                                                       StateCode049
## StateCode026                                                                       StateCode026
## StateCode023                                                                       StateCode023
## StateCode019                                                                       StateCode019
## StateCode004                                                                       StateCode004
## StateCode025                                                                       StateCode025
## StateCode048                                                                       StateCode048
## StateCode012                                                                       StateCode012
## StateCode024                                                                       StateCode024
## StateCode033                                                                       StateCode033
## StateCode015                                                                       StateCode015
## StateCode050                                                                       StateCode050
## StateCode031                                                                       StateCode031
## StateCode041                                                                       StateCode041
## StateCode010                                                                       StateCode010
## StateCode011                                                                       StateCode011
## StateCode020                                                                       StateCode020
cat("\nFeature Importances by IncNodePurity:\n")
## 
## Feature Importances by IncNodePurity:
print(sorted_by_inc_node_purity)
##                                                     %IncMSE IncNodePurity
## Payment_PAYM_90_HIP_KNEE                         9.03883574  125.26854000
## Score_COMP_HIP_KNEE                             10.72099517  113.08145609
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE  5.27534292   73.57559485
## OP_18b                                           4.15404171   66.40490249
## Score_PSI_13                                     2.17475455   62.17694293
## Score_PSI_11                                     3.08855153   62.16132416
## Score_PSI_14                                     3.68517112   58.90547765
## Score_PSI_10                                     3.23107560   58.84939459
## Score_PSI_03                                     3.46582190   58.03081970
## Score_PSI_12                                     2.30684585   57.92670746
## Score_Ovr_MORT                                   2.54248526   56.71609772
## Score_PSI_04                                     1.35238340   55.18931551
## Score_PSI_15                                     2.20278425   53.99298910
## HCP_COVID_19                                    -0.08579886   53.81821089
## Score_PSI_09                                     1.59564021   53.46189072
## SAFE_USE_OF_OPIOIDS                              1.62970944   48.55372793
## VTE_1                                            2.12580910   47.62238798
## IMM_3                                            1.31862242   43.04444799
## Score_PSI_06                                     1.55882232   39.27187720
## OP_29                                            0.54582706   35.15294543
## EDV                                              3.54546541   29.93993860
## Score_PSI_08                                     1.09407851   17.99849530
## StateCode009                                     3.75181579   12.68932168
## StateCode022                                     3.01081021    7.67914120
## StateCode035                                    -0.19417318    6.68021439
## StateCode014                                     1.32430609    5.75899367
## StateCode017                                     2.17699288    5.53669955
## StateCode043                                     1.09398067    4.95968535
## StateCode013                                    -0.30227278    4.74041027
## StateCode005                                    -0.42203032    4.13871395
## StateCode030                                     2.04843083    4.02894867
## StateCode025                                    -0.86678037    3.69146300
## StateCode032                                    -0.28329467    3.32535504
## StateCode038                                    -0.37765964    3.28462255
## StateCode010                                    -1.27820479    3.25993507
## StateCode023                                    -0.72584221    3.03005360
## StateCode049                                    -0.55564975    2.93855287
## StateCode016                                     0.86359756    2.86664974
## StateCode040                                     1.90785092    2.84701791
## StateCode036                                    -0.29518214    2.76191318
## StateCode027                                     2.06830695    2.41220726
## StateCode033                                    -0.94710201    2.35772816
## StateCode021                                     0.06745647    2.12006229
## StateCode018                                    -0.29430012    2.10533264
## StateCode046                                     0.15190201    2.00772867
## StateCode004                                    -0.80746546    2.00122946
## StateCode006                                     1.59630427    1.87440875
## StateCode047                                    -0.54564469    1.75726713
## StateCode020                                    -2.82194723    1.65521496
## StateCode044                                     1.25970392    1.55475060
## StateCode007                                     0.91807859    1.34055175
## StateCode002                                     1.66104276    1.30433146
## StateCode042                                    -0.24927432    1.25474996
## StateCode041                                    -1.05841539    1.08506574
## StateCode019                                    -0.77874702    1.07206757
## StateCode015                                    -0.96797290    1.03562782
## StateCode028                                     2.01652439    1.00660990
## StateCode024                                    -0.94335245    0.97347110
## StateCode003                                    -0.31581052    0.92501152
## StateCode012                                    -0.89584566    0.87503983
## StateCode048                                    -0.89423959    0.82992245
## StateCode037                                     1.88293311    0.77661060
## StateCode011                                    -1.39915516    0.61320416
## StateCode026                                    -0.63405234    0.50697302
## StateCode029                                     0.35450222    0.41983563
## StateCode050                                    -1.00503782    0.30357229
## StateCode034                                    -0.26070754    0.27529986
## StateCode031                                    -1.03104209    0.27458034
## StateCode045                                     0.46701246    0.26882544
## StateCode039                                     0.42256120    0.15466717
## StateCode008                                     0.00000000    0.02570528
##                                                                                         Feature
## Payment_PAYM_90_HIP_KNEE                                               Payment_PAYM_90_HIP_KNEE
## Score_COMP_HIP_KNEE                                                         Score_COMP_HIP_KNEE
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## OP_18b                                                                                   OP_18b
## Score_PSI_13                                                                       Score_PSI_13
## Score_PSI_11                                                                       Score_PSI_11
## Score_PSI_14                                                                       Score_PSI_14
## Score_PSI_10                                                                       Score_PSI_10
## Score_PSI_03                                                                       Score_PSI_03
## Score_PSI_12                                                                       Score_PSI_12
## Score_Ovr_MORT                                                                   Score_Ovr_MORT
## Score_PSI_04                                                                       Score_PSI_04
## Score_PSI_15                                                                       Score_PSI_15
## HCP_COVID_19                                                                       HCP_COVID_19
## Score_PSI_09                                                                       Score_PSI_09
## SAFE_USE_OF_OPIOIDS                                                         SAFE_USE_OF_OPIOIDS
## VTE_1                                                                                     VTE_1
## IMM_3                                                                                     IMM_3
## Score_PSI_06                                                                       Score_PSI_06
## OP_29                                                                                     OP_29
## EDV                                                                                         EDV
## Score_PSI_08                                                                       Score_PSI_08
## StateCode009                                                                       StateCode009
## StateCode022                                                                       StateCode022
## StateCode035                                                                       StateCode035
## StateCode014                                                                       StateCode014
## StateCode017                                                                       StateCode017
## StateCode043                                                                       StateCode043
## StateCode013                                                                       StateCode013
## StateCode005                                                                       StateCode005
## StateCode030                                                                       StateCode030
## StateCode025                                                                       StateCode025
## StateCode032                                                                       StateCode032
## StateCode038                                                                       StateCode038
## StateCode010                                                                       StateCode010
## StateCode023                                                                       StateCode023
## StateCode049                                                                       StateCode049
## StateCode016                                                                       StateCode016
## StateCode040                                                                       StateCode040
## StateCode036                                                                       StateCode036
## StateCode027                                                                       StateCode027
## StateCode033                                                                       StateCode033
## StateCode021                                                                       StateCode021
## StateCode018                                                                       StateCode018
## StateCode046                                                                       StateCode046
## StateCode004                                                                       StateCode004
## StateCode006                                                                       StateCode006
## StateCode047                                                                       StateCode047
## StateCode020                                                                       StateCode020
## StateCode044                                                                       StateCode044
## StateCode007                                                                       StateCode007
## StateCode002                                                                       StateCode002
## StateCode042                                                                       StateCode042
## StateCode041                                                                       StateCode041
## StateCode019                                                                       StateCode019
## StateCode015                                                                       StateCode015
## StateCode028                                                                       StateCode028
## StateCode024                                                                       StateCode024
## StateCode003                                                                       StateCode003
## StateCode012                                                                       StateCode012
## StateCode048                                                                       StateCode048
## StateCode037                                                                       StateCode037
## StateCode011                                                                       StateCode011
## StateCode026                                                                       StateCode026
## StateCode029                                                                       StateCode029
## StateCode050                                                                       StateCode050
## StateCode034                                                                       StateCode034
## StateCode031                                                                       StateCode031
## StateCode045                                                                       StateCode045
## StateCode039                                                                       StateCode039
## StateCode008                                                                       StateCode008

Assessing Random Forest Performance

# Remove columns from the test set to match train set
HipKneeTest_RF <- HipKneeTest %>%
  select(-State, -FacilityName, -FacilityId)

# Make predictions on test set
rf_predictions <- predict(rf_grid_search, newdata = HipKneeTest_RF)

# Actual values
actual_values <- HipKneeTest$PredictedReadmissionRate_HIP_KNEE

# Calculate RMSE
mse <- mean((rf_predictions - actual_values)^2)
rmse <- sqrt(mse)

# Calculate R-squared
ss_total <- sum((actual_values - mean(actual_values))^2)
ss_residual <- sum((rf_predictions - actual_values)^2)
r_squared <- 1 - (ss_residual / ss_total)

# Print RMSE and R-squared
cat("RMSE on test set:\n")
## RMSE on test set:
print(rmse)
## [1] 0.3616222
cat("\nR-squared on test set:\n")
## 
## R-squared on test set:
print(r_squared)
## [1] 0.8417858

Testing assumptions

# Calculate residuals
residuals_rf <- actual_values - rf_predictions

# Residuals vs Fitted Values plot
ggplot(data = NULL, aes(x = rf_predictions, y = residuals_rf)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "loess", se = FALSE, color = "blue") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

# Histogram of residuals
ggplot(data = NULL, aes(x = residuals_rf)) +
  geom_histogram(binwidth = 0.1, fill = "blue", alpha = 0.7, boundary = 0) +
  labs(title = "Histogram of Residuals",
       x = "Residuals",
       y = "Frequency") +
  theme_minimal()

# QQ plot of residuals
qqnorm(residuals_rf, main = "QQ Plot of Residuals")
qqline(residuals_rf, col = "red")

# Perform Durbin-Watson test for autocorrelation in residuals
dw_test_result <- dwtest(lm(residuals_rf ~ rf_predictions))
print(dw_test_result)
## 
##  Durbin-Watson test
## 
## data:  lm(residuals_rf ~ rf_predictions)
## DW = 1.8049, p-value = 1.44e-05
## alternative hypothesis: true autocorrelation is greater than 0

Elastic Net (AC)

Creating the model

# Separate predictors and response variable in the training set
x_train <- as.matrix(HipKneeTrain %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_train <- HipKneeTrain$PredictedReadmissionRate_HIP_KNEE

# Separate predictors and response variable in the test set
x_test <- as.matrix(HipKneeTest %>% select(-c(State, FacilityName, PredictedReadmissionRate_HIP_KNEE)))
y_test <- HipKneeTest$PredictedReadmissionRate_HIP_KNEE

# Define the grid of hyperparameters
searchGrid <- expand.grid(.alpha = seq(0, 1, length.out = 10), 
                          .lambda = seq(0, 5, length.out = 15))

# Define the train control
ctrl <- trainControl(method = "repeatedcv", 
                     number = 10, 
                     repeats = 5, 
                     search = "grid", 
                     verboseIter = FALSE)

# Set up cross-validation
elasticnet_model <- train(
  x = x_train,
  y = y_train,
  method = "glmnet",
  trControl = ctrl,
  tuneGrid = searchGrid
)
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,
## : There were missing values in resampled performance measures.
# Best hyperparameters
best_alpha <- elasticnet_model$bestTune$alpha
best_lambda <- elasticnet_model$bestTune$lambda

# Print best alpha and lambda
print(paste("Best Alpha: ", best_alpha))
## [1] "Best Alpha:  0"
print(paste("Best Lambda: ", best_lambda))
## [1] "Best Lambda:  0"

Assessing performance

# Make predictions on the test set
predictions <- predict(elasticnet_model, newdata = x_test)

# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))

# Print RMSE
print(paste("RMSE on Test Set: ", rmse))
## [1] "RMSE on Test Set:  0.799586716495415"
# Calculate performance metrics on the test set
performance <- postResample(pred = predictions, obs = y_test)

# Extract and print R-squared
r_squared <- performance["Rsquared"]
print(paste("R^2 on Test Set: ", r_squared))
## [1] "R^2 on Test Set:  0.226601204078256"
# Get the feature importance
important <- varImp(elasticnet_model)$importance

# View the feature importance
important %>% 
  mutate(Feature = rownames(important)) %>% 
  mutate(Feature = gsub("\\.", " ", Feature)) %>% 
  arrange(desc(Overall)) %>% 
  ggplot(aes(y = Overall, fill = Overall, x = fct_reorder(Feature, Overall))) +
  geom_col() + 
  scale_fill_continuous(low = "lightblue", high = "darkblue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Feature importance as determined by Elastic Net",
       x = "",
       y = "Importance", 
       fill = "")

SVM (AC)

Creating the model

Selecting the kernel

# Convert x_train and x_test back to data frames
x_train <- as.data.frame(x_train)
x_test <- as.data.frame(x_test)

# Ensure all columns are numeric
x_train[] <- lapply(x_train, as.numeric)
x_test[] <- lapply(x_test, as.numeric)

# Convert y_train to a data frame
train_data <- cbind(y_train = y_train, x_train)

# Define the grid for kernel types
searchGrid_kernel <- expand.grid(
  .kernel = c("linear", "polynomial", "radial", "sigmoid")
)

# Train the SVM model with kernel tuning
svm_tune_kernel <- tune(svm, 
                        y_train ~ ., 
                        data = train_data, 
                        ranges = searchGrid_kernel, 
                        tunecontrol = tune.control(
                          sampling = "cross", 
                          cross = 10
                        )
)

# Extract the best kernel
best_kernel <- svm_tune_kernel$best.model$kernel
if (best_kernel == 0) {
  kernel_description <- "Linear kernel"
} else if (best_kernel == 1) {
  kernel_description <- "Polynomial kernel"
} else if (best_kernel == 2) {
  kernel_description <- "Radial kernel"
} else if (best_kernel == 3) {
  kernel_description <- "Sigmoid kernel"
} else {
  kernel_description <- "Unknown kernel"
}

cat("Best Kernel Description:", kernel_description, "\n")
## Best Kernel Description: Radial kernel

Tuning gamma and cost

# Define the grid for gamma
searchGrid_gamma <- expand.grid(
  gamma = c(0.01, 0.1, 1)
)

# Train the SVM model with gamma tuning
svm_tune_gamma <- tune(svm, 
                       y_train ~ ., 
                       data = train_data, 
                       ranges = searchGrid_gamma, 
                       kernel = "radial", 
                       tunecontrol = tune.control(
                         sampling = "cross", 
                         cross = 10
                       )
)

# Extract the best gamma
best_gamma <- svm_tune_gamma$best.model$gamma
cat("Best Gamma:", best_gamma, "\n")
## Best Gamma: 0.01
# Define the grid for cost
searchGrid_cost <- expand.grid(
  C = c(0.1, 1, 10)
)

# Train the SVM model with cost tuning
svm_tune_cost <- tune(svm, 
                      y_train ~ ., 
                      data = train_data, 
                      ranges = searchGrid_cost, 
                      kernel = "radial", 
                      tunecontrol = tune.control(
                        sampling = "cross", 
                        cross = 10
                      )
)

# Extract the best cost
best_cost <- svm_tune_cost$best.model$cost
cat("Best Cost:", best_cost, "\n")
## Best Cost: 1

Create the final model

# Final model with best parameters
svm_final <- svm(y_train ~ ., 
                  data = train_data,
                  kernel = "radial", 
                  C = 1, 
                  gamma = 0.01,
                  probability = TRUE)

Assessing performance

# Make predictions on the test set
predictions <- predict(svm_final, x_test, type = "response")

# Calculate RMSE
rmse <- sqrt(mean((predictions - y_test)^2))
cat("RMSE on Test Set:", rmse, "\n")
## RMSE on Test Set: 0.7568615
# Calculate R-squared
rss <- sum((y_test - predictions)^2)
tss <- sum((y_test - mean(y_test))^2)
r_squared <- 1 - (rss / tss)
cat("R-squared on Test Set:", r_squared, "\n")
## R-squared on Test Set: 0.3069442

Supervised Modeling with Qualitative Variable

Categorizing the target variable (AC)

# Check for median
print(median(HipKneeTrain$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))
## [1] 4.4769
print(median(HipKneeTest$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE))
## [1] 4.4769
# Calculate the median of the target variable from the training data
median_value <- median(HipKneeTrain$PredictedReadmissionRate_HIP_KNEE, na.rm = TRUE)

# Categorize the target variable in the training data
HipKneeTrain_Qual <- HipKneeTrain %>%
  mutate(TargetCategory = ifelse(PredictedReadmissionRate_HIP_KNEE > median_value, 1, 0))

# Categorize the target variable in the testing data using the median from the training data
HipKneeTest_Qual <- HipKneeTest %>%
  mutate(TargetCategory = ifelse(PredictedReadmissionRate_HIP_KNEE > median_value, 1, 0))

Random Forest (AC)

Create the model

# Remove unwanted columns from the dataset
HipKneeTrain_QualRF <- HipKneeTrain_Qual %>%
  select(-State, -FacilityName, -FacilityId, -PredictedReadmissionRate_HIP_KNEE)

# Define mtry parameter grid
grid <- expand.grid(
  mtry = c(2, 4, 6, 8)
)

# Define CV
train_control <- trainControl(
  method = "cv",         
  number = 7,           
  verboseIter = TRUE     
)

# Ensure the target variable is a factor
HipKneeTrain_QualRF$TargetCategory <- as.factor(HipKneeTrain_QualRF$TargetCategory)

# Train the Random Forest model with grid search
rf_grid_search_qual <- train(
  TargetCategory ~ .,   
  data = HipKneeTrain_QualRF,                    
  method = "rf",                          
  trControl = train_control,              
  tuneGrid = grid,                       
  importance = TRUE,                     
  ntree = 100                         
)
## + Fold1: mtry=2 
## - Fold1: mtry=2 
## + Fold1: mtry=4 
## - Fold1: mtry=4 
## + Fold1: mtry=6 
## - Fold1: mtry=6 
## + Fold1: mtry=8 
## - Fold1: mtry=8 
## + Fold2: mtry=2 
## - Fold2: mtry=2 
## + Fold2: mtry=4 
## - Fold2: mtry=4 
## + Fold2: mtry=6 
## - Fold2: mtry=6 
## + Fold2: mtry=8 
## - Fold2: mtry=8 
## + Fold3: mtry=2 
## - Fold3: mtry=2 
## + Fold3: mtry=4 
## - Fold3: mtry=4 
## + Fold3: mtry=6 
## - Fold3: mtry=6 
## + Fold3: mtry=8 
## - Fold3: mtry=8 
## + Fold4: mtry=2 
## - Fold4: mtry=2 
## + Fold4: mtry=4 
## - Fold4: mtry=4 
## + Fold4: mtry=6 
## - Fold4: mtry=6 
## + Fold4: mtry=8 
## - Fold4: mtry=8 
## + Fold5: mtry=2 
## - Fold5: mtry=2 
## + Fold5: mtry=4 
## - Fold5: mtry=4 
## + Fold5: mtry=6 
## - Fold5: mtry=6 
## + Fold5: mtry=8 
## - Fold5: mtry=8 
## + Fold6: mtry=2 
## - Fold6: mtry=2 
## + Fold6: mtry=4 
## - Fold6: mtry=4 
## + Fold6: mtry=6 
## - Fold6: mtry=6 
## + Fold6: mtry=8 
## - Fold6: mtry=8 
## + Fold7: mtry=2 
## - Fold7: mtry=2 
## + Fold7: mtry=4 
## - Fold7: mtry=4 
## + Fold7: mtry=6 
## - Fold7: mtry=6 
## + Fold7: mtry=8 
## - Fold7: mtry=8 
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 6 on full training set
# Best parameters
best_params_qual <- rf_grid_search_qual$bestTune
print(best_params_qual)
##   mtry
## 3    6
# Extract feature importances
best_rf_model_qual <- rf_grid_search_qual$finalModel
feature_importances_qual <- importance(best_rf_model_qual)

# Convert feature importances to a df
feature_importances_df_qual <- as.data.frame(feature_importances_qual)
feature_importances_df_qual$Feature <- rownames(feature_importances_df_qual)

# Rename columns for clarity
colnames(feature_importances_df_qual) <- c("MeanDecreaseAccuracy", "MeanDecreaseGini", "Feature")

# Sort by MeanDecreaseGini
sorted_by_importance_qual <- feature_importances_df_qual[order(-feature_importances_df_qual$MeanDecreaseGini), ]

# Print the sorted table
print(sorted_by_importance_qual)
##                                                 MeanDecreaseAccuracy
## Payment_PAYM_90_HIP_KNEE                                  5.40104782
## Score_COMP_HIP_KNEE                                       6.42690149
## StateCode009                                              1.28753785
## StateCode033                                             -0.23498854
## StateCode040                                              1.32376176
## StateCode004                                              0.81405817
## StateCode017                                              2.03368248
## StateCode029                                              0.03386386
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE           5.45513622
## StateCode022                                              0.09316030
## StateCode049                                             -1.64809227
## StateCode023                                             -1.22641937
## StateCode043                                              1.23088597
## VTE_1                                                     0.64384815
## StateCode036                                             -1.60344688
## StateCode008                                              0.00000000
## StateCode011                                              0.00000000
## StateCode045                                             -1.42792626
## StateCode050                                              0.00000000
## StateCode030                                              1.49287893
## StateCode007                                             -1.08718648
## StateCode006                                              1.10438922
## StateCode035                                             -0.28781530
## StateCode042                                              0.23232541
## StateCode031                                              1.73990438
## StateCode018                                              2.13115982
## StateCode048                                             -0.61357874
## StateCode046                                             -0.59552305
## StateCode015                                             -1.73076963
## StateCode044                                              0.61924346
## StateCode002                                              1.42766967
## StateCode014                                              0.46458209
## SAFE_USE_OF_OPIOIDS                                       2.52038546
## StateCode020                                             -0.23214604
## StateCode005                                              1.41842050
## StateCode032                                              1.73366522
## HCP_COVID_19                                              0.63283525
## StateCode038                                              0.24962974
## StateCode013                                              1.51852661
## StateCode003                                             -0.92155915
## IMM_3                                                     0.25908779
## StateCode034                                              0.00000000
## StateCode010                                             -0.72420878
## StateCode037                                              1.20080496
## StateCode024                                              0.65777196
## StateCode021                                             -0.42516852
## StateCode028                                              2.55840703
## Score_PSI_04                                              1.49911233
## EDV                                                       2.68151276
## Score_PSI_08                                              1.92194337
## StateCode027                                             -0.31094173
## StateCode016                                             -0.65490380
## StateCode025                                              0.55362428
## Score_PSI_13                                              4.29631828
## OP_18b                                                    4.72374954
## StateCode047                                              1.82496420
## Score_PSI_11                                              3.60649714
## Score_PSI_09                                              4.16869123
## StateCode012                                              0.44225000
## Score_PSI_10                                              3.98852844
## StateCode019                                             -0.49821290
## StateCode039                                              0.00000000
## StateCode041                                             -1.00503782
## Score_PSI_06                                              4.48727417
## Score_Ovr_MORT                                            3.59600587
## Score_PSI_15                                              4.86247161
## Score_PSI_03                                              7.28901819
## OP_29                                                    -0.29310217
## StateCode026                                              1.91035888
## Score_PSI_12                                              4.36615717
## Score_PSI_14                                              3.91249617
##                                                 MeanDecreaseGini       Feature
## Payment_PAYM_90_HIP_KNEE                              5.31553840  7.2891911346
## Score_COMP_HIP_KNEE                                   5.13184462  8.5515954070
## StateCode009                                          4.71440685  4.2487030381
## StateCode033                                          2.06189841  1.3626355875
## StateCode040                                          2.04643253  2.0508378585
## StateCode004                                          1.74637048  1.9223071966
## StateCode017                                          1.56857840  2.6407675081
## StateCode029                                          1.43565892  1.2989636385
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE       1.40893435  5.4708562784
## StateCode022                                          1.35358638  1.1406398605
## StateCode049                                          1.19935102 -0.2929786409
## StateCode023                                          1.16258401 -0.0285104984
## StateCode043                                          1.12681950  1.8099727276
## VTE_1                                                 1.02477200  1.1815711241
## StateCode036                                          1.00908537 -0.5862420988
## StateCode008                                          1.00503782  1.0050378153
## StateCode011                                          1.00503782  1.0050378153
## StateCode045                                          1.00503782 -0.5380062773
## StateCode050                                          1.00503782  1.0050378153
## StateCode030                                          0.94023038  1.7696737277
## StateCode007                                          0.93368367 -0.2338423728
## StateCode006                                          0.87310824  1.4086100258
## StateCode035                                          0.84843171  0.3643226459
## StateCode042                                          0.76969104  0.8480246838
## StateCode031                                          0.75739057  1.5350119845
## StateCode018                                          0.68114193  1.8020316844
## StateCode048                                          0.67973045  0.3719097271
## StateCode046                                          0.65715122  0.0005004647
## StateCode015                                          0.65544248 -0.8372705829
## StateCode044                                          0.65375127  1.1366517706
## StateCode002                                          0.57597861  1.3561293526
## StateCode014                                          0.53442114  0.7642053440
## SAFE_USE_OF_OPIOIDS                                   0.49581093  1.9598942953
## StateCode020                                          0.34858066  0.1531742113
## StateCode005                                          0.33942281  1.2975198910
## StateCode032                                          0.27532263  1.6132555529
## HCP_COVID_19                                          0.20967022  0.5445997313
## StateCode038                                          0.17102249  0.2385095311
## StateCode013                                          0.03867433  0.9199183513
## StateCode003                                          0.03463470 -0.6080259254
## IMM_3                                                 0.01603399  0.1996032784
## StateCode034                                          0.00000000  0.0000000000
## StateCode010                                         -0.17444291 -0.6321023726
## StateCode037                                         -0.17798628  0.7026700690
## StateCode024                                         -0.26963572  0.2456289052
## StateCode021                                         -0.31172571 -0.4686267375
## StateCode028                                         -0.35594511  1.8337417480
## Score_PSI_04                                         -0.36139671  0.7748653717
## EDV                                                  -0.36421323  1.8723783538
## Score_PSI_08                                         -0.37458029  1.0913351137
## StateCode027                                         -0.38669495 -0.4586326251
## StateCode016                                         -0.39400519 -0.6012473995
## StateCode025                                         -0.42805022  0.2163726204
## Score_PSI_13                                         -0.43657691  2.3765855826
## OP_18b                                               -0.48551025  3.0870400250
## StateCode047                                         -0.50857959  0.7339231933
## Score_PSI_11                                         -0.58140237  1.8423280941
## Score_PSI_09                                         -0.76841166  2.4939441183
## StateCode012                                         -0.82837950 -0.3562642945
## Score_PSI_10                                         -0.88473005  2.5656544622
## StateCode019                                         -0.97199908 -0.9210818626
## StateCode039                                         -1.00503782 -1.0050378153
## StateCode041                                         -1.00503782 -1.3514296116
## Score_PSI_06                                         -1.08850238  2.6467471085
## Score_Ovr_MORT                                       -1.17710646  1.7145488724
## Score_PSI_15                                         -1.40891885  2.7272677514
## Score_PSI_03                                         -1.71514533  4.0199166214
## OP_29                                                -1.78418841 -1.3797878263
## StateCode026                                         -2.03965347  0.6077818054
## Score_PSI_12                                         -2.10504541  1.5876558612
## Score_PSI_14                                         -2.75620002  0.9090930286
##                                                         NA
## Payment_PAYM_90_HIP_KNEE                        61.7802942
## Score_COMP_HIP_KNEE                             52.0169644
## StateCode009                                     6.7006287
## StateCode033                                     1.8199371
## StateCode040                                     1.2680823
## StateCode004                                     1.1663800
## StateCode017                                     2.6144810
## StateCode029                                     0.6778753
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 34.6827457
## StateCode022                                     2.6751510
## StateCode049                                     1.6073748
## StateCode023                                     1.8960085
## StateCode043                                     3.1903899
## VTE_1                                           30.9400056
## StateCode036                                     1.2574410
## StateCode008                                     0.4226257
## StateCode011                                     0.2414930
## StateCode045                                     0.4546357
## StateCode050                                     0.1936687
## StateCode030                                     1.6879774
## StateCode007                                     0.9954967
## StateCode006                                     1.8763672
## StateCode035                                     2.5927025
## StateCode042                                     1.6565044
## StateCode031                                     1.0944128
## StateCode018                                     1.6365892
## StateCode048                                     0.6252896
## StateCode046                                     2.3803928
## StateCode015                                     0.8728028
## StateCode044                                     1.2959189
## StateCode002                                     0.2863423
## StateCode014                                     2.0717800
## SAFE_USE_OF_OPIOIDS                             34.4456811
## StateCode020                                     1.3129294
## StateCode005                                     3.9310891
## StateCode032                                     3.3394571
## HCP_COVID_19                                    38.6093732
## StateCode038                                     3.0553040
## StateCode013                                     2.8996766
## StateCode003                                     1.1355152
## IMM_3                                           31.6186716
## StateCode034                                     0.3516932
## StateCode010                                     1.5600169
## StateCode037                                     1.2435615
## StateCode024                                     1.3484640
## StateCode021                                     1.7177273
## StateCode028                                     1.0616093
## Score_PSI_04                                    39.9670655
## EDV                                             16.4174607
## Score_PSI_08                                    12.4605134
## StateCode027                                     0.6072735
## StateCode016                                     1.2322591
## StateCode025                                     2.1198082
## Score_PSI_13                                    39.2081932
## OP_18b                                          40.6293177
## StateCode047                                     2.1186051
## Score_PSI_11                                    39.1343153
## Score_PSI_09                                    35.7198013
## StateCode012                                     0.6879212
## Score_PSI_10                                    36.4933131
## StateCode019                                     0.6676086
## StateCode039                                     0.1957526
## StateCode041                                     0.2878464
## Score_PSI_06                                    27.9230430
## Score_Ovr_MORT                                  41.5672778
## Score_PSI_15                                    40.8152794
## Score_PSI_03                                    37.7109756
## OP_29                                           28.3816556
## StateCode026                                     0.6832516
## Score_PSI_12                                    35.8119019
## Score_PSI_14                                    37.6190055
##                                                                                              NA
## Payment_PAYM_90_HIP_KNEE                                               Payment_PAYM_90_HIP_KNEE
## Score_COMP_HIP_KNEE                                                         Score_COMP_HIP_KNEE
## StateCode009                                                                       StateCode009
## StateCode033                                                                       StateCode033
## StateCode040                                                                       StateCode040
## StateCode004                                                                       StateCode004
## StateCode017                                                                       StateCode017
## StateCode029                                                                       StateCode029
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE
## StateCode022                                                                       StateCode022
## StateCode049                                                                       StateCode049
## StateCode023                                                                       StateCode023
## StateCode043                                                                       StateCode043
## VTE_1                                                                                     VTE_1
## StateCode036                                                                       StateCode036
## StateCode008                                                                       StateCode008
## StateCode011                                                                       StateCode011
## StateCode045                                                                       StateCode045
## StateCode050                                                                       StateCode050
## StateCode030                                                                       StateCode030
## StateCode007                                                                       StateCode007
## StateCode006                                                                       StateCode006
## StateCode035                                                                       StateCode035
## StateCode042                                                                       StateCode042
## StateCode031                                                                       StateCode031
## StateCode018                                                                       StateCode018
## StateCode048                                                                       StateCode048
## StateCode046                                                                       StateCode046
## StateCode015                                                                       StateCode015
## StateCode044                                                                       StateCode044
## StateCode002                                                                       StateCode002
## StateCode014                                                                       StateCode014
## SAFE_USE_OF_OPIOIDS                                                         SAFE_USE_OF_OPIOIDS
## StateCode020                                                                       StateCode020
## StateCode005                                                                       StateCode005
## StateCode032                                                                       StateCode032
## HCP_COVID_19                                                                       HCP_COVID_19
## StateCode038                                                                       StateCode038
## StateCode013                                                                       StateCode013
## StateCode003                                                                       StateCode003
## IMM_3                                                                                     IMM_3
## StateCode034                                                                       StateCode034
## StateCode010                                                                       StateCode010
## StateCode037                                                                       StateCode037
## StateCode024                                                                       StateCode024
## StateCode021                                                                       StateCode021
## StateCode028                                                                       StateCode028
## Score_PSI_04                                                                       Score_PSI_04
## EDV                                                                                         EDV
## Score_PSI_08                                                                       Score_PSI_08
## StateCode027                                                                       StateCode027
## StateCode016                                                                       StateCode016
## StateCode025                                                                       StateCode025
## Score_PSI_13                                                                       Score_PSI_13
## OP_18b                                                                                   OP_18b
## StateCode047                                                                       StateCode047
## Score_PSI_11                                                                       Score_PSI_11
## Score_PSI_09                                                                       Score_PSI_09
## StateCode012                                                                       StateCode012
## Score_PSI_10                                                                       Score_PSI_10
## StateCode019                                                                       StateCode019
## StateCode039                                                                       StateCode039
## StateCode041                                                                       StateCode041
## Score_PSI_06                                                                       Score_PSI_06
## Score_Ovr_MORT                                                                   Score_Ovr_MORT
## Score_PSI_15                                                                       Score_PSI_15
## Score_PSI_03                                                                       Score_PSI_03
## OP_29                                                                                     OP_29
## StateCode026                                                                       StateCode026
## Score_PSI_12                                                                       Score_PSI_12
## Score_PSI_14                                                                       Score_PSI_14

Assessing the performance

# Remove columns from the test set to match train set
HipKneeTest_QualRF <- HipKneeTest_Qual %>%
  select(-State, -FacilityName, -FacilityId, -PredictedReadmissionRate_HIP_KNEE)

# Ensure the target variable is a factor
HipKneeTest_QualRF$TargetCategory <- as.factor(HipKneeTest_QualRF$TargetCategory)

# Predict on the test set
pred_rf_qual <- predict(rf_grid_search_qual, newdata = HipKneeTest_QualRF)

# Calculate accuracy
accuracy_rf_qual <- mean(pred_rf_qual == HipKneeTest_QualRF$TargetCategory)
cat("Accuracy of the Random Forest Model:", accuracy_rf_qual, "\n")
## Accuracy of the Random Forest Model: 1
# Calculate the confusion matrix
conf_matrix_rf_qual <- confusionMatrix(pred_rf_qual, HipKneeTest_QualRF$TargetCategory)

# Print the confusion matrix
print(conf_matrix_rf_qual)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 917   0
##          1   0 916
##                                     
##                Accuracy : 1         
##                  95% CI : (0.998, 1)
##     No Information Rate : 0.5003    
##     P-Value [Acc > NIR] : < 2.2e-16 
##                                     
##                   Kappa : 1         
##                                     
##  Mcnemar's Test P-Value : NA        
##                                     
##             Sensitivity : 1.0000    
##             Specificity : 1.0000    
##          Pos Pred Value : 1.0000    
##          Neg Pred Value : 1.0000    
##              Prevalence : 0.5003    
##          Detection Rate : 0.5003    
##    Detection Prevalence : 0.5003    
##       Balanced Accuracy : 1.0000    
##                                     
##        'Positive' Class : 0         
## 
# Predict on the test set
pred_rf_qual <- predict(rf_grid_search_qual, newdata = HipKneeTest_QualRF, type = "prob")

# Extract the predicted probabilities for the positive class (assuming "1" is the positive class)
pred_prob_pos <- pred_rf_qual[, "1"]

# Generate predictions object for ROCR
pred <- prediction(pred_prob_pos, HipKneeTest_QualRF$TargetCategory)

# Calculate and plot ROC curve
roc_curve <- performance(pred, "tpr", "fpr")
plot(roc_curve, main = "ROC Curve for Random Forest Model", col = "blue")

# Calculate AUC
auc <- performance(pred, "auc")@y.values[[1]]
print(paste("AUC:", auc))
## [1] "AUC: 1"

Logistic Regression (AC)

Creating the model

# Ensure factors have consistent levels between training and test sets
factor_columns <- sapply(HipKneeTrain_Qual, is.factor)

HipKneeTest_Qual <- HipKneeTest_Qual %>%
  mutate(across(where(is.factor), ~ factor(.x, levels = levels(HipKneeTrain_Qual[[cur_column()]]))))

# Separate predictors and response variable in the training set
x_train_qual <- HipKneeTrain_Qual %>% select(-c(State, FacilityName, FacilityId, PredictedReadmissionRate_HIP_KNEE, TargetCategory))
y_train_qual <- HipKneeTrain_Qual$TargetCategory

# Ensure the response variable is a factor
y_train_qual <- as.factor(y_train_qual)

# Separate predictors and response variable in the test set
x_test_qual <- HipKneeTest_Qual %>% select(-c(State, FacilityName, FacilityId, PredictedReadmissionRate_HIP_KNEE, TargetCategory))
y_test_qual <- HipKneeTest_Qual$TargetCategory

# Ensure the response variable is a factor
y_test_qual <- as.factor(y_test_qual)

# Define the train control for logistic regression
ctrl <- trainControl(method = "repeatedcv", 
                     number = 10, 
                     repeats = 5, 
                     verboseIter = FALSE)

# Set up cross-validation for logistic regression
logistic_model_qual <- train(
  x = x_train_qual,
  y = y_train_qual,
  method = "glm",
  trControl = ctrl,
  family = "binomial"
)
## Warning: Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
## Setting row names on a tibble is deprecated.
# Print the model summary
best_model <- logistic_model_qual$finalModel
summary(best_model)
## 
## Call:
## NULL
## 
## Coefficients:
##                                                   Estimate Std. Error z value
## (Intercept)                                     -1.847e+00  2.312e+00  -0.799
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE -6.833e-02  1.734e-02  -3.940
## EDV                                              2.205e-01  5.962e-02   3.698
## HCP_COVID_19                                    -1.134e-03  7.158e-03  -0.158
## IMM_3                                            4.506e-03  3.800e-03   1.186
## OP_18b                                           3.834e-03  1.341e-03   2.860
## OP_29                                           -2.306e-03  4.770e-03  -0.483
## SAFE_USE_OF_OPIOIDS                              5.648e-02  1.354e-02   4.170
## VTE_1                                            7.700e-03  6.521e-03   1.181
## Score_COMP_HIP_KNEE                              8.414e-01  1.118e-01   7.524
## Score_PSI_03                                     7.321e-02  1.142e-01   0.641
## Score_PSI_04                                    -1.764e-03  3.147e-03  -0.560
## Score_PSI_06                                     4.762e-01  1.206e+00   0.395
## Score_PSI_08                                     1.518e+00  6.751e+00   0.225
## Score_PSI_09                                    -5.014e-02  1.153e-01  -0.435
## Score_PSI_10                                     1.797e-01  1.522e-01   1.180
## Score_PSI_11                                     1.950e-02  1.861e-02   1.048
## Score_PSI_12                                     5.651e-02  7.326e-02   0.771
## Score_PSI_13                                     1.588e-02  5.563e-02   0.285
## Score_PSI_14                                     6.572e-02  1.531e-01   0.429
## Score_PSI_15                                    -1.006e-01  1.714e-01  -0.587
## Payment_PAYM_90_HIP_KNEE                         1.822e-04  3.339e-05   5.457
## StateCode002                                    -1.307e+00  1.193e+00  -1.095
## StateCode003                                     3.462e-01  5.489e-01   0.631
## StateCode004                                    -6.162e-01  6.064e-01  -1.016
## StateCode005                                    -4.056e-01  4.366e-01  -0.929
## StateCode006                                    -6.218e-01  5.710e-01  -1.089
## StateCode007                                     2.884e-01  6.435e-01   0.448
## StateCode008                                    -2.822e-01  1.071e+00  -0.263
## StateCode009                                     1.072e+00  4.670e-01   2.295
## StateCode010                                    -1.437e-02  5.263e-01  -0.027
## StateCode011                                    -1.219e+00  1.339e+00  -0.910
## StateCode012                                     1.086e+00  7.641e-01   1.421
## StateCode013                                     6.523e-02  4.693e-01   0.139
## StateCode014                                     4.823e-01  4.856e-01   0.993
## StateCode015                                     1.134e-01  6.167e-01   0.184
## StateCode016                                     2.773e-01  5.495e-01   0.505
## StateCode017                                     2.265e+00  7.402e-01   3.060
## StateCode018                                     4.841e-01  5.615e-01   0.862
## StateCode019                                     9.728e-01  7.762e-01   1.253
## StateCode020                                    -3.363e-02  5.647e-01  -0.060
## StateCode021                                    -2.592e-01  5.296e-01  -0.489
## StateCode022                                     7.436e-01  4.834e-01   1.538
## StateCode023                                     1.053e+00  5.646e-01   1.865
## StateCode024                                     3.460e-01  6.349e-01   0.545
## StateCode025                                     3.809e-01  5.084e-01   0.749
## StateCode026                                    -1.602e-01  8.312e-01  -0.193
## StateCode027                                    -1.120e-01  6.645e-01  -0.169
## StateCode028                                    -1.276e+00  7.384e-01  -1.728
## StateCode029                                     4.195e-01  7.651e-01   0.548
## StateCode030                                     1.466e-01  5.294e-01   0.277
## StateCode031                                    -1.684e+00  9.110e-01  -1.848
## StateCode032                                    -7.965e-01  4.952e-01  -1.608
## StateCode033                                     5.028e-06  4.966e-01   0.000
## StateCode034                                     7.514e-01  1.059e+00   0.710
## StateCode035                                     4.873e-01  4.684e-01   1.040
## StateCode036                                     5.238e-01  5.310e-01   0.986
## StateCode037                                    -4.222e-01  6.678e-01  -0.632
## StateCode038                                    -2.461e-01  4.588e-01  -0.536
## StateCode039                                     4.939e-01  1.087e+00   0.454
## StateCode040                                    -3.776e-01  5.760e-01  -0.656
## StateCode041                                    -4.948e-02  7.898e-01  -0.063
## StateCode042                                     1.655e-01  5.224e-01   0.317
## StateCode043                                    -2.144e-01  4.391e-01  -0.488
## StateCode044                                    -2.893e-01  6.123e-01  -0.473
## StateCode045                                    -5.249e-01  9.719e-01  -0.540
## StateCode046                                    -8.265e-02  5.080e-01  -0.163
## StateCode047                                     1.423e-01  5.400e-01   0.264
## StateCode048                                     5.539e-02  7.043e-01   0.079
## StateCode049                                     5.183e-01  5.224e-01   0.992
## StateCode050                                     2.770e-01  9.199e-01   0.301
## Score_Ovr_MORT                                  -1.848e-01  5.517e-02  -3.350
##                                                 Pr(>|z|)    
## (Intercept)                                     0.424406    
## HcahpsLinearMeanValue_H_HSP_RATING_LINEAR_SCORE 8.16e-05 ***
## EDV                                             0.000217 ***
## HCP_COVID_19                                    0.874170    
## IMM_3                                           0.235740    
## OP_18b                                          0.004239 ** 
## OP_29                                           0.628816    
## SAFE_USE_OF_OPIOIDS                             3.05e-05 ***
## VTE_1                                           0.237704    
## Score_COMP_HIP_KNEE                             5.32e-14 ***
## Score_PSI_03                                    0.521413    
## Score_PSI_04                                    0.575165    
## Score_PSI_06                                    0.692951    
## Score_PSI_08                                    0.822030    
## Score_PSI_09                                    0.663554    
## Score_PSI_10                                    0.237932    
## Score_PSI_11                                    0.294620    
## Score_PSI_12                                    0.440543    
## Score_PSI_13                                    0.775363    
## Score_PSI_14                                    0.667740    
## Score_PSI_15                                    0.557141    
## Payment_PAYM_90_HIP_KNEE                        4.83e-08 ***
## StateCode002                                    0.273589    
## StateCode003                                    0.528175    
## StateCode004                                    0.309501    
## StateCode005                                    0.352841    
## StateCode006                                    0.276138    
## StateCode007                                    0.654027    
## StateCode008                                    0.792183    
## StateCode009                                    0.021707 *  
## StateCode010                                    0.978218    
## StateCode011                                    0.362883    
## StateCode012                                    0.155286    
## StateCode013                                    0.889466    
## StateCode014                                    0.320628    
## StateCode015                                    0.854163    
## StateCode016                                    0.613856    
## StateCode017                                    0.002213 ** 
## StateCode018                                    0.388591    
## StateCode019                                    0.210069    
## StateCode020                                    0.952509    
## StateCode021                                    0.624621    
## StateCode022                                    0.123971    
## StateCode023                                    0.062241 .  
## StateCode024                                    0.585745    
## StateCode025                                    0.453738    
## StateCode026                                    0.847198    
## StateCode027                                    0.866149    
## StateCode028                                    0.083978 .  
## StateCode029                                    0.583438    
## StateCode030                                    0.781768    
## StateCode031                                    0.064576 .  
## StateCode032                                    0.107737    
## StateCode033                                    0.999992    
## StateCode034                                    0.477926    
## StateCode035                                    0.298110    
## StateCode036                                    0.323980    
## StateCode037                                    0.527227    
## StateCode038                                    0.591704    
## StateCode039                                    0.649683    
## StateCode040                                    0.512083    
## StateCode041                                    0.950047    
## StateCode042                                    0.751401    
## StateCode043                                    0.625343    
## StateCode044                                    0.636546    
## StateCode045                                    0.589109    
## StateCode046                                    0.870772    
## StateCode047                                    0.792155    
## StateCode048                                    0.937318    
## StateCode049                                    0.321140    
## StateCode050                                    0.763320    
## Score_Ovr_MORT                                  0.000809 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2541.1  on 1832  degrees of freedom
## Residual deviance: 2156.6  on 1761  degrees of freedom
## AIC: 2300.6
## 
## Number of Fisher Scoring iterations: 4

Assessing performance

# Predict class labels on the test set based on a threshold 
predicted_labels <- ifelse(predictions > 0.5, 1, 0) 

# Convert predicted labels to factors with the same levels as y_test_qual
predicted_labels <- factor(predicted_labels, levels = levels(y_test_qual))

# Calculate accuracy
accuracy <- mean(predicted_labels == y_test_qual)
print(paste("Accuracy: ", accuracy))
## [1] "Accuracy:  0.499727223131478"
# Create a confusion matrix
conf_matrix <- caret::confusionMatrix(predicted_labels, y_test_qual)
print(conf_matrix)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0   0   0
##          1 917 916
##                                           
##                Accuracy : 0.4997          
##                  95% CI : (0.4766, 0.5229)
##     No Information Rate : 0.5003          
##     P-Value [Acc > NIR] : 0.5279          
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.0000          
##             Specificity : 1.0000          
##          Pos Pred Value :    NaN          
##          Neg Pred Value : 0.4997          
##              Prevalence : 0.5003          
##          Detection Rate : 0.0000          
##    Detection Prevalence : 0.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : 0               
## 
# ROC and AUC
roc_curve <- roc(y_test_qual, predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc <- auc(roc_curve)
print(paste("AUC: ", auc))
## [1] "AUC:  0.761577766877946"
# Plot ROC curve
plot(roc_curve, main = "ROC Curve for Logistic Regression", col = "blue")

# Extract the coefficients from the best model
coefficients <- coef(logistic_model_qual$finalModel, s = best_lambda_qual)

# Convert coefficients to a dataframe
coefficients_df <- as.data.frame(as.matrix(coefficients))
coefficients_df$Feature <- rownames(coefficients_df)
coefficients_df <- coefficients_df %>% 
  mutate(Feature = gsub("\\.", " ", Feature)) %>% 
  rename(Importance = V1) %>%
  arrange(desc(abs(Importance))) %>%
  filter(Feature != "(Intercept)")  # Remove the intercept term if present

# Plot feature importance
ggplot(coefficients_df, aes(y = Importance, fill = Importance, x = fct_reorder(Feature, Importance))) +
  geom_col() + 
  scale_fill_continuous(low = "lightblue", high = "darkblue") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Feature Importance as Determined by Logistic Regression",
       x = "",
       y = "Coefficient Value", 
       fill = "Coefficient")

SVM (AC)

Creating the model

Selecting the kernel

# Convert x_train and x_test back to data frames
x_train_qual <- as.data.frame(x_train_qual)
x_test_qual <- as.data.frame(x_test_qual)

# Ensure all columns are numeric
x_train_qual[] <- lapply(x_train_qual, as.numeric)
x_test_qual[] <- lapply(x_test_qual, as.numeric)

# Convert y_train to a data frame
train_data_qual <- cbind(y_train_qual = y_train_qual, x_train_qual)

# Define the grid for kernel types
searchGrid_kernel <- expand.grid(
  .kernel = c("linear", "polynomial", "radial", "sigmoid")
)

# Train the SVM model with kernel tuning
svm_tune_kernel_qual <- tune(svm, 
                        y_train_qual ~ ., 
                        data = train_data_qual, 
                        ranges = searchGrid_kernel, 
                        tunecontrol = tune.control(
                          sampling = "cross", 
                          cross = 10
                        )
)

# Extract the best kernel
best_kernel_qual <- svm_tune_kernel_qual$best.model$kernel
if (best_kernel_qual == 0) {
  kernel_description_qual <- "Linear kernel"
} else if (best_kernel_qual == 1) {
  kernel_description_qual <- "Polynomial kernel"
} else if (best_kernel_qual == 2) {
  kernel_description_qual <- "Radial kernel"
} else if (best_kernel_qual == 3) {
  kernel_description_qual <- "Sigmoid kernel"
} else {
  kernel_description_qual <- "Unknown kernel"
}

cat("Best Kernel Description:", kernel_description_qual, "\n")
## Best Kernel Description: Radial kernel

Tuning gamma and cost

# Define the grid for gamma
searchGrid_gamma <- expand.grid(
  gamma = c(0.01, 0.1, 1)
)

# Train the SVM model with gamma tuning
svm_tune_gamma_qual <- tune(svm, 
                       y_train_qual ~ ., 
                       data = train_data_qual, 
                       ranges = searchGrid_gamma, 
                       kernel = "radial", 
                       tunecontrol = tune.control(
                         sampling = "cross", 
                         cross = 10
                       )
)

# Extract the best gamma
best_gamma_qual <- svm_tune_gamma_qual$best.model$gamma
cat("Best Gamma:", best_gamma_qual, "\n")
## Best Gamma: 0.01
# Define the grid for cost
searchGrid_cost <- expand.grid(
  C = c(0.1, 1, 10)
)

# Train the SVM model with cost tuning
svm_tune_cost_qual <- tune(svm, 
                      y_train_qual ~ ., 
                      data = train_data_qual, 
                      ranges = searchGrid_cost, 
                      kernel = "radial", 
                      tunecontrol = tune.control(
                        sampling = "cross", 
                        cross = 10
                      )
)

# Extract the best cost
best_cost_qual <- svm_tune_cost_qual$best.model$cost
cat("Best Cost:", best_cost_qual, "\n")
## Best Cost: 1

Create the final model

# Final model with best parameters
svm_final_qual <- svm(y_train_qual ~ ., 
                  data = train_data_qual,
                  kernel = "radial", 
                  C = 1, 
                  gamma = 0.01,
                  probability = TRUE)

Assessing performance

# Predict on the test set
pred_test_svm_tuned <- predict(svm_final_qual, x_test_qual, probability = TRUE)

# Create the confusion matrix
confMat_tuned <- caret::confusionMatrix(pred_test_svm_tuned, y_test_qual)
print(confMat_tuned)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 603 274
##          1 314 642
##                                           
##                Accuracy : 0.6792          
##                  95% CI : (0.6573, 0.7006)
##     No Information Rate : 0.5003          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.3584          
##                                           
##  Mcnemar's Test P-Value : 0.1078          
##                                           
##             Sensitivity : 0.6576          
##             Specificity : 0.7009          
##          Pos Pred Value : 0.6876          
##          Neg Pred Value : 0.6715          
##              Prevalence : 0.5003          
##          Detection Rate : 0.3290          
##    Detection Prevalence : 0.4785          
##       Balanced Accuracy : 0.6792          
##                                           
##        'Positive' Class : 0               
## 
# Extract decision values for ROC curve
decision_values <- attributes(predict(svm_final_qual, x_test_qual, decision.values = TRUE))$decision.values
pred <- prediction(decision_values, y_test_qual)
roc_curve <- performance(pred, "tpr", "fpr")

# Plot ROC curve
plot(roc_curve, main = "ROC Curve for Tuned SVM", col = "blue")

# Calculate AUC
pred_prob <- attr(predict(svm_final_qual, x_test_qual, probability = TRUE), "probabilities")[, c(0, 1)]
auc_value <- roc(y_test_qual, pred_prob)$auc
## Setting levels: control = 0, case = 1
## Setting direction: controls > cases
print(paste("AUC:", auc_value))
## [1] "AUC: 0.749077350197387"